<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-xu-lsr-isis-flooding-reduction-in-msdc-04"
     ipr="trust200902">
  <front>
    <title abbrev="">IS-IS Flooding Reduction in MSDC</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <email>xuxiaohu_ietf@hotmail.com</email>
      </address>
    </author>

    <author fullname="Luyuan Fang" initials="L. " surname="Fang">
      <organization>eBay</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>luyuanf@gmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Jeff Tantsura" initials="J." surname="Tantsura">
      <organization>Nvidia</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>jefftant.ietf@gmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shaowen Ma" initials="S." surname="Ma">
      <organization>Google</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shaowen@google.com</email>

        <uri/>
      </address>
    </author>

    <!--

-->

    <date day="3" month="August" year="2023"/>

    <abstract>
      <t>IS-IS is commonly used as an underlay routing protocol for MSDC
      (Massively Scalable Data Center) networks. For a given IS-IS router
      within the CLOS topology, it would receive multiple copies of exactly
      the same LSP from multiple IS-IS neighbors. In addition, two IS-IS
      neighbors may send each other the same LSP simultaneously. The
      unnecessary link-state information flooding wastes the precious process
      resource of IS-IS routers greatly due to the fact that there are too
      many IS-IS neighbors for each IS-IS router within the CLOS topology.
      This document proposes some extensions to IS-IS so as to reduce the
      IS-IS flooding within MSDC networks greatly. The reduction of the IS-IS
      flooding is much beneficial to improve the scalability of MSDC
      networks.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>IS-IS is commonly used as an underlay routing protocol for Massively
      Scalable Data Center (MSDC) networks where CLOS is the most popular
      topology.</t>

      <t>For a given IS-IS router within the CLOS topology, it would receive
      multiple copies of exactly the same LSP from multiple IS-IS neighbors.
      In addition, two IS-IS neighbors may send each other the same LSP
      simultaneously. The unnecessary link-state information flooding wastes
      the precious process resource of IS-IS routers greatly and therefore
      IS-IS could not scale very well in MSDC networks.</t>

      <t>As a result, some MSDC operators had to choose BGP as the routing
      protocol in their data centers <xref target="RFC7938"/>. However, with
      the emergence of high-performance Ethernet networks for AI and high
      performance computing (HPC), the visibility of the whole network
      topology, and even the link load information, is crucial for the
      end-to-end path load-balancing. As a result, link-state routing
      protocols, such as IS-IS, would have to be reconsidered as the routing
      protocol for large-scale AI and HPC Ethernet networks. Of course, the
      prerequisite is the scaling issue associated with link-state routing
      protocols as mentioned above could be addressed.</t>

      <t>This document describes a pragmatic approach to the above scaling
      issue. The basic idea is as follows: instead of flooding link-state
      information across neighboring IS-IS routers with the MSDC network
      fabric, link-state information originated from each IS-IS routers would
      be collected to centralized controllers, which in turn reflect the
      collected link-state information to all IS-IS routers within the MSDC.
      As shown in Figure 1, all IS-IS routers within a MDSC network fabric are
      connected to one or more centralized controllers via a dedicated Local
      Area Network (LAN) , referred to as link-state collection and
      distribution LAN, which is used for link-state information collection
      and distribution purpose. For redundancy, there should be at least two
      link-state collection and distribution LANs.</t>

      <t><figure>
          <artwork align="center"><![CDATA[           +----------+                  +----------+                     
           |Controller|                  |Controller|                     
           +----+-----+                  +-----+----+                     
                |DIS                           |Candidate DIS                       
                |                              |                          
                |                              |                          
   ---+---------+---+----------+-----------+---+---------+-LS Collection&Distribution LAN       
      |             |          |           |             |                
      |Non-DIS      |Non-DIS   |Non-DIS    |Non-DIS      |Non-DIS          
      |             |          |           |             |                
      |         +---+--+       |       +---+--+          |                
      |         |Router|       |       |Router|          |                
      |         *------*-      |      /*---/--*          |                
      |        /     \   --    |    //    /    \         |                
      |        /     \     --  |  //      /    \         |                
      |       /       \      --|//       /      \        |                
      |       /        \      /*-       /        \       |                
      |      /          \   // | --    /         \       |                
      |      /          \ //   |   --  /          \      |                
      |     /           /X     |     --           \      |                
      |     /         //  \    |     / --          \     |                
      |    /        //    \    |     /   --         \    |                
      |    /      //       \   |    /      --       \    |                
      |   /     //          \  |   /         --      \   |                
      |   /   //             \ |  /            --     \  |                
      |  /  //               \ |  /              --   \  |                
    +-+- //*                +\\+-/-+               +---\-++               
    |Router|                |Router|               |Router|               
    +------+                +------+               +------+               

                              Figure 1
]]></artwork>
        </figure></t>

      <t>With the assistance of a controller acting as IS-IS Designated
      Intermediate System (DIS) for the link-state collection and distribution
      LAN, IS-IS routers within the MSDC network don't need to exchange any
      IS-IS Protocol Datagram Units (PDUs) other than Hello packets among
      them. In order to obtain the full topology information (i.e., the fully
      synchronized link-state database) of the MSDC's network, these IS-IS
      routers would exchange the link-state information with the controller
      being elected as IS-IS DIS for the link-state collection and
      distribution LAN instead.</t>

      <t>To further suppress the flooding of multicast IS-IS PDUs originated
      from IS-IS routers over the link-state collection and distribution LAN,
      IS-IS routers would not send multicast IS-IS Hello packets over the
      link-state collection and distribution LAN. Instead, they just wait for
      IS-IS Hello packets originated from the controller being elected as
      IS-IS DIS initially. Once an IS-IS DIS for the link-state collection and
      distribution LAN has been discovered, they start to send IS-IS Hello
      packets directly (as unicasts) to the IS-IS DIS periodically. In
      addition, IS-IS routers would send IS-IS PDUs to the IS-IS DIS for the
      link-state collection and distribution LAN as unicasts as well. In
      contrast, the controller being elected as IS-IS DIS would send IS-IS
      PDUs as before. As a result, IS-IS routers would not receive IS-IS PDUs
      from one another unless these IS-IS PDUs are forwarded as unknown
      unicasts over the link-state collection and distribution LAN. Through
      the above modifications to the current IS-IS router behaviors, the IS-IS
      flooding is greatly reduced, which is much beneficial to improve the
      scalability of MSDC networks.</t>
    </section>

    <section anchor="Abbreviations_Terminology" title="Terminology">
      <t>This memo makes use of the terms defined in <xref
      target="RFC1195"/>.</t>
    </section>

    <section title="Modifications to Current IS-IS Behaviors ">
      <t/>

      <section title="IS-IS Routers as Non-DIS">
        <t>After the bidirectional exchange of IS-IS Hello packets among IS-IS
        routers, IS-IS routers would originate Link State PDUs (LSPs)
        accordingly. However, these self-originated LSPs need not to be
        exchanged directly among them anymore. Instead, these LSPs just need
        to be sent solely to the controller being elected as IS-IS DIS for the
        link-state collection and distribution LAN.</t>

        <t>To further reduce the flood of multicast IS-IS PDUs over the
        link-state collection and distribution LAN, IS-IS routers SHOULD send
        IS-IS PDUs as unicasts. More specifically, IS-IS routers SHOULD send
        unicast IS-IS Hello packets periodically to the controller being
        elected as IS-IS DIS. In other words, IS-IS routers would not send any
        IS-IS Hello packet over the link-state collection and distribution LAN
        until they have found an IS-IS DIS for the link-state collection and
        distribution LAN. Note that IS-IS routers SHOULD NOT be elected as
        IS-IS DIS for the link-state collection and distribution LAN (This is
        done by setting the DIS Priority of those IS-IS routers to zero). As a
        result, IS-IS routers would not see each other over the link-state
        collection and distribution LAN. In other word, IS-IS routers would
        not establish adjacencies with one other. Furthermore, IS-IS routers
        SHOULD send all the types of IS-IS PDUs to the controller being
        elected as IS-IS DIS as unicasts as well.</t>

        <t>To avoid the data traffic from being forwarded across the
        link-state collection and distribution LAN, the cost of all IS-IS
        routers' interfaces to the link-state collection and distribution LAN
        SHOULD be set to the maximum value.</t>

        <t>When a given IS-IS router lost its connection to the link-state
        collection and distribution LAN, it SHOULD actively establish
        adjacency with all of its IS-IS neighbors within the CLOS network. As
        such, it could obtain the full LSDB of the CLOS network while flooding
        its self-originated LSPs to the remaining part of the whole CLOS
        network through these IS-IS neighbor.</t>
      </section>

      <section title="Controllers as DIS">
        <t>The controller being elected as IS-IS DIS would send IS-IS PDUs as
        multicasts or unicasts as before. And it SHOULD accept and process
        those unicast IS-IS PDUs originated from IS-IS routers. Upon receiving
        any new LSP from a given IS-IS router, the controller being elected as
        DIS MUST flood it immediately to the link-state collection and
        distribution LAN for two purposes: 1) implicitly acknowledging the
        receipt of that LSP; 2) synchronizing that LSP to all the other IS-IS
        routers.</t>

        <t>Furthermore, to decrease the frequency of advertising Complete
        Sequence Number PDU (CSNP) on the controller being elected as DIS,
        it's RECOMMENDED that IS-IS routers SHOULD send an explicit
        acknowledgement with a Partial Sequence Number PDU (PSNP) upon
        receiving a new LSP from the controller being elected as DIS.</t>
      </section>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Peter Lothberg and Erik Auerswald for
      his valuable comments and suggestions on this document.</t>

      <!---->
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>

      <!---->
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.1195'?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc ?>

      <?rfc include='reference.RFC.4136'?>

      <?rfc include='reference.RFC.7938'?>

      <!---->
    </references>
  </back>
</rfc>
