<?xml version="1.0" encoding="US-ASCII"?>
<!-- This is built from a template for a generic Internet Draft. Suggestions for
     improvement welcome - write to Brian Carpenter, brian.e.carpenter @ gmail.com 
     This can be converted using the Web service at http://xml.resource.org/ -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!-- You want a table of contents -->
<!-- Use symbolic labels for references -->
<!-- This sorts the references -->
<!-- Change to "yes" if someone has disclosed IPR for the draft -->
<!-- This defines the specific filename and version number of your draft (and inserts the appropriate IETF boilerplate -->
<?rfc sortrefs="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc topblock="yes"?>
<?rfc comments="no"?>
<rfc category="info" docName="draft-yao-coinrg-generic-framework-00"
     ipr="trust200902">
  <front>
    <title abbrev="Computing in the Network Research Group">A Generic COIN
    framework in controlled environments</title>

    <author fullname="Kehan Yao" initials="K." surname="Yao">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>yaokehan@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Shiping Xu" initials="S." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>xushiping@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Zhiqiang Li" initials="Z." surname="Li">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>lizhiqiangyjy@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Wenfei Wu" initials="W." surname="Wu">
      <organization>Peking University</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100871</code>

          <country>China</country>
        </postal>

        <email>wenfeiwu@pku.edu.cn</email>
      </address>
    </author>

    <date day="13" month="March" year="2023"/>

    <area>IRTF</area>

    <workgroup>Computing in the Network Research Group</workgroup>

    <keyword>framework;COIN</keyword>

    <abstract>
      <t>There have been a lot of academic research and industrial practice in
      the area of COIN, but most of them are case-by-case design and currently
      they also rely heavily on programmable network devices, which lacks some
      generality and scalability, thus will impede the development of COIN.
      This document summarizes the computing primitives/operations/semantics
      that can be implemented inside the network, through analysis of
      different COIN use cases, and proposes a generic framework of COIN in
      the controlled environments. Enabling technologies related to the
      framework and the standardization landscape are also analyzed in the
      document.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>Programmable network devices(PNDs) including programmable switches
      and SmartNICs have inspired a lot of research work in the area of COIN.
      Like In-band Network Telemetry(INT), Network functions offloading(LBs,
      Firewalls), etc. However, technically, we argue that these use cases are
      not strictly &ldquo;computing&rdquo; in the network, since they are
      hardware implementation of network functions which traditionally
      implemented in servers so as to accelerate or enhance these network
      functions. The &ldquo;network&rdquo; in COIN is also ambiguous.
      Narrowly, it refers to network devices like PNDs, but broadly, it refers
      to network elements in different contexts. In edge computing or fog
      computing, these network elements refer to ubiquitous heterogeneous edge
      devices, but in controlled environments like data centers, network
      elements refer to normal network devices. And in this draft, we just
      limit the scope of the discussion inside the controlled environment,
      which is consistent with most of the existing work.</t>

      <t>To make the work in COIN move further, there is a need to reach a
      consensus on the definition of COIN. Despite there is an ongoing draft
      about the terminology of COIN in the group, we want to share our
      thoughts. Computing in the network is &ldquo;to offload
      application-specific functions to network elements, so as to accelerate
      applications&rdquo;. These application-specific functions are described
      by series of computing primitives/operations/semantics that could be
      supported by network elements, and they explain about what to
      &ldquo;compute&rdquo; in the network. A very illustrative example is
      In-network Aggregation(INA) for distributed machine learning model
      training. The aggregation operation is implemented in network devices,
      which could accelerate the entire model training process.A lot of
      research have investigated what kind of computing primitives can be
      offloaded to network devices, but there still lack a systematic
      summarization of these application-specific primitives. We think that
      application-specific functions can be generalized to be several types of
      computing primitives which could be further standardized, thus COIN will
      not depend on PNDs for implementation, but normal network devices that
      support these general primitives could take the work.</t>

      <t>Further, current research on how COIN could accelerate applications
      usually depend on a case-by-case hardware software co-design scheme,
      which lacks generality and scalability for the development of COIN.
      There is a need to design a generic framework of COIN, for one thing, to
      make COIN a common capability of the network, for another, to lower the
      application development barriers.</t>

      <t>Based on the analysis above, this document classifies several kinds
      of computing primitives which could be standardized, and proposes a
      generic framework of COIN, which can be scaled and promoted in the
      controlled environment.</t>
    </section>

    <section title="Conventions Used in This Document">
      <section title="Terminology">
        <t>PND Programmable Network Device</t>
      </section>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14<xref target="RFC2119"/><xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Generic Framework">
      <t>The generic COIN framework contains three logical layers: Scheduling
      layer(S), Control layer(C), and Infrastructure layer(I).</t>

      <figure align="center" title="Figure 1: Generic COIN Framework">
        <artwork type="ascii-art">+---------------------------------------------------------------------+
|  Scheduling Layer                                                   |
|   +---------------------------------------------------------------+ |
|   |                            Scheduler                          | |
|   |                                                               | |
|   |                    Resource (Host and COIN&#65289;                  | |
|   |            Job Decomposition (Task Scheduling Policy )        | |
|   +---------------------------------------------------------------+ |
+---------------------------------------------------------------------+
                   |Host Task                         |COIN Task
+---------------------------------------------------------------------+
|  Control Layer   |                                  |               |
|                  |                                  |               |
| +----------------v------------+     +---------------v-------------+ |
| |      Host Controller        |     |       COIN Controller       | |
| |       &#65288; optional&#65289;          -----&gt;                             | |
| |                           Collaboration  COIN Task Installation | |
| |   Host Task Installation    |     |           Routing           | |
| |  End-Network Collaboration  &lt;-----+   End-Network Collaboration | |
| +-----------------------------+     +-----------------------------+ |
+---------------------------------------------------------------------+
                  | Host Management               | Device Management
                  | Host Task Control             | COIN Task Control
                  |                               |
+-----------------+---------------------------------------------------+
| Infrastructure Layer                            |                   |
|                 |                               |                   |
|    +------------v---------+         +-----------v----------------+  |
|    |           Host       |         |       Network Device       |  |
|    |                      |         |                            |  |
|    +---------------- -----+         +----------------------------+  |
+---------------------------------------------------------------------+</artwork>
      </figure>

      <t>The scheduling layer (S) decomposes a job into host tasks and COIN
      tasks according to the host and COIN resources and scheduling policy.
      These tasks are then distributed to the control layer.</t>

      <t>The control layer (C) is divided into host controller and COIN
      controller, both of them can be centralized or distributed. Host
      Controller is optional, which is deployed on demand according to the
      application scenario. A host controller is mainly responsible for host
      task deployment and control. The COIN controller is mainly responsible
      for network management, COIN task deployment and control, and routing.
      The host controller and the COIN controller are combined to realize the
      end-network cooperation.</t>

      <t>The infrastructure layer (I) includes the host and network equipment,
      including the relevant routing protocols and reliability protocols to
      realize COIN.</t>
    </section>

    <section title="Enabling Technologies">
      <section title="The Scheduling Layer">
        <t>Task decomposition is the first step to achieve end-network
        collaborative in-network computing. Through appropriate scheduling
        policy, reasonable resource allocation can be achieved and better task
        performance can be achieved. With the addition of in-network computing
        technology, it is necessary to consider not only the host resources,
        but also the in-network computing resources.</t>
      </section>

      <section title="The Control Layer">
        <t>End-network collaborative control realized by the host controller
        and the COIN controller.</t>

        <t>Network side:</t>

        <t>* Network equipment management, including network equipment status,
        load condition, network equipment computing capacity and resource,
        etc.</t>

        <t>* Network topology management, including network topology update,
        link status monitoring, etc.</t>

        <t>* Routing, selecting an optimal path for in-network computing and
        forwarding.</t>

        <t>Host side:</t>

        <t>* Cooperate with the host application to do the COIN processing,
        including completing the overall calculation task with the network
        side, and reliability control.</t>
      </section>

      <section title="The Infrastructure Layer">
        <t>Network equipment implements the standard COIN primitive.</t>

        <t>A set of unified COIN primitives makes COIN more easier to achieve
        docking and promotion. Some research work <xref target="NetRPC"/><xref
        target="Netcompute"/>summarize common COIN primitives and data
        structures. We refer to these research work and choose some major COIN
        primitives out of these work. ValStr_Agg is used in applications like
        distributed machine learning training, Asyn_Val_Agg is used in big
        data analysis applications where map-reduce is needed. K-V is used for
        caching, and consensus is used for synchronization within distributed
        systems. Heterogeneous network devices can have different internal
        implementations of the same COIN primitives, but the services provided
        externally need to be unified. There is a need to standardize these
        COIN primitives for generic use cases. Of course, due to equipment
        differences, there may be differences in calculation accuracy for some
        primitives. These differences need to be considered in task
        decomposition and routing.</t>

        <figure align="center" title="Figure 2: COIN Primitives">
          <artwork type="ascii-art">+------------+--------------+-------------------------------------+
|   Type     |Data Structure|                 Primitives          |
+------------------------------------------------------------------
| ValStr_Agg |     Array    |   Map.get, Map.add, Map.clear       |
+------------------------------------------------------------------
|Asyn_Val_Agg|      Map     |  Map.get, Map.add, Stream.modify    |
+------------------------------------------------------------------
|     K-V    |      Map     |            Map.get, Map.add         |
+------------------------------------------------------------------
|  consensus |    Integer   |  Map.get, Map.add, Map.clear        |
+------------+--------------+-------------------------------------+</artwork>
        </figure>

        <t>COIN transformation of application program on host side.</t>

        <t>Network cannot guarantee that the computing task can be completed
        during each transmission process, so the host side applications need
        to be COIN aware and be able to flexibly process the data that has
        been in-network processed or not.</t>
      </section>
    </section>

    <section title="Research challenges and other considerations">
      <t>* End and network collaboration. Due to the limited resources within
      network devices, there is a need to design some fallback mechanisms when
      tasks cannot be fully accomplished within the network, and they should
      be finished at the end devices. Relative algorithms, protocols should be
      considered for implementation.</t>

      <t>* COIN reliability and correctness. On the premise that tasks can be
      offloaded to network devices for computing, the correctness and
      reliability of the work should be considered. There should be some
      mechanisms designed to maintain that the COIN results is consistent with
      that when tasks are fully accomplished at end devices. Besides, reliable
      data transmission in COIN should be elaborately designed, since many
      applications have very strict QoS requirements.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <reference anchor="NetRPC"
                 target="https://doi.org/10.48550/arXiv.2212.08362 ">
        <front>
          <title>NetRPC: Enabling In-Network Computation in Remote Procedure
          Calls</title>

          <author>
            <organization>Zhao, B., Wu, W., &amp; Xu, W.</organization>
          </author>

          <date month="December" year="2022"/>
        </front>
      </reference>

      <reference anchor="Netcompute"
                 target="https://doi.org/10.1145/3317550.3321439">
        <front>
          <title>When Should The Network Be The Computer?</title>

          <author>
            <organization>Dan R. K. Ports, Jacob Nelson</organization>
          </author>

          <date month="May" year="2019"/>
        </front>
      </reference>

      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>
    </references>
  </back>
</rfc>
