RFC 8293
RFC 8293
RFC 8293
Ghanwani
Request for Comments: 8293 Dell
Category: Informational L. Dunbar
ISSN: 2070-1721 M. McBride
Huawei
V. Bannai
Google
R. Krishnan
Dell
January 2018
Abstract
Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Infrastructure Multicast . . . . . . . . . . . . . . . . 3
1.2. Application-Specific Multicast . . . . . . . . . . . . . 4
2. Terminology and Abbreviations . . . . . . . . . . . . . . . . 4
3. Multicast Mechanisms in Networks That Use NVO3 . . . . . . . 5
3.1. No Multicast Support . . . . . . . . . . . . . . . . . . 6
3.2. Replication at the Source NVE . . . . . . . . . . . . . . 6
3.3. Replication at a Multicast Service Node . . . . . . . . . 8
3.4. IP Multicast in the Underlay . . . . . . . . . . . . . . 10
3.5. Other Schemes . . . . . . . . . . . . . . . . . . . . . . 11
4. Simultaneous Use of More Than One Mechanism . . . . . . . . . 12
5. Other Issues . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1. Multicast-Agnostic NVEs . . . . . . . . . . . . . . . . . 12
5.2. Multicast Membership Management for DC with VMs . . . . . 13
6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
9.1. Normative References . . . . . . . . . . . . . . . . . . 13
9.2. Informative References . . . . . . . . . . . . . . . . . 14
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 17
Authors’ Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction
2. The set of multicast listeners for each multicast group may not
be known in advance. Therefore, it may not be possible or
practical for an NVA to get the list of participants for each
multicast group ahead of time.
In this document, the terms host, Tenant System (TS), and Virtual
Machine (VM) are used interchangeably to represent an end station
that originates or consumes data packets.
What makes networks using NVO3 different from other networks is that
some NVEs, especially NVEs implemented in servers, might not support
regular multicast protocols such as PIM. Instead, the only
capability they may support would be that of encapsulating data
packets from VMs with an outer unicast header. Therefore, it is
important for networks using NVO3 to have mechanisms to support
multicast as a network capability for NVEs, to map multicast traffic
from VMs (users/applications) to an equivalent multicast capability
inside the NVE, or to figure out the outer destination address if NVE
does not support native multicast (e.g., PIM) or IGMP.
With NVO3, there are many possible ways that multicast may be handled
in such networks. We discuss some of the attributes of the following
four methods:
1. No multicast support
We note that other methods are also possible, such as [EDGE-REP], but
we focus on the above four because they are the most common.
The main drawback of this approach, even for unicast traffic, is that
it is not possible to initiate communication with a TS for which a
mapping to an NVE does not already exist at the NVA. This is a
problem in the case where the NVE is implemented in a physical switch
and the TS is a physical end station that has not registered with the
NVA.
For this mechanism to work, the source NVE must know, a priori, the
IP addresses of all destination NVEs that need to receive the packet.
For the purpose of ARP/ND, this would involve knowing the IP
addresses of all the NVEs that have TSs in the VN of the TS that
generated the request.
This method requires multiple copies of the same packet to all NVEs
that participate in the VN. If, for example, a tenant subnet is
spread across 50 NVEs, the packet would have to be replicated 50
times at the source NVE. Obviously, this approach creates more
traffic to the network that can cause congestion when the network
load is high. This also creates an issue with the forwarding
performance of the NVE.
Note that this method is similar to what was used in Virtual Private
LAN Service (VPLS) [RFC4762] prior to support of Multiprotocol Label
Switching (MPLS) multicast [RFC7117]. While there are some
similarities between MPLS Virtual Private Network (VPN) and NVO3,
there are some key differences:
With this method, all multicast packets would be sent using a unicast
tunnel encapsulation from the ingress NVE to a Multicast Service Node
(MSN). The MSN, in turn, would create multiple copies of the packet
and would deliver a copy, using a unicast tunnel encapsulation, to
each of the NVEs that are part of the multicast group for which the
packet is intended.
The following are possible ways for the MSN to get the membership
information for each multicast group:
o The MSN can obtain this membership information from the IGMP/MLD
report messages sent by TSs in response to IGMP/MLD query messages
from the MSN. The IGMP/MLD query messages are sent from the MSN
to the NVEs, which then forward the query messages to TSs attached
to them. An IGMP/MLD query message sent out by the MSN to an NVE
is encapsulated with the MSN address in the outer IP source
address field and the address of the NVE in the outer IP
destination address field. An encapsulated IGMP/MLD query message
also has a virtual network (VN) identifier (corresponding to the
VN that the TSs belong to) in the outer header and a multicast
address in the inner IP destination address field. Upon receiving
the encapsulated IGMP/MLD query message, the NVE establishes a
mapping for "MSN address" to "multicast address", decapsulates the
received encapsulated IGMP/MLD message, and multicasts the
decapsulated query message to the TSs that belong to the VN
attached to that NVE. An IGMP/MLD report message sent by a TS
includes the multicast address and the address of the TS. With
the proper "MSN address" to "multicast address" mapping, the NVEs
can encapsulate all multicast data frames containing the
"multicast address" with the address of the MSN in the outer IP
destination address field.
o The MSN can obtain the membership information from the NVEs that
have the capability to establish multicast groups by snooping
native IGMP/MLD messages (note that the communication must be
specific to the multicast addresses) or by having the NVA obtain
the information from the NVEs and in turn have MSN communicate
with the NVA. This approach requires additional protocol between
MSN and NVEs.
o The NVE only supports the basic IGMP/MLD snooping function, while
the "TS routers" handle the application-specific multicast. This
scheme doesn’t utilize the underlay IP multicast protocols.
Instead routers, which are themselves TSs attached to the NVE,
would handle multicast protocols for the application-specific
multicast. We refer to such routers as TS routers.
o The NVE can act as a pseudo multicast router for the directly
attached TSs and support the mapping of IGMP/MLD messages to the
messages needed by the underlay IP multicast protocols.
With this method, there are none of the issues with the methods
described in Sections 3.2 and 3.3 with respect to scaling and
congestion. Instead, there are other issues described below.
There are additional optimizations that are possible, but they come
with their own restrictions. For example, a set of tenants may be
restricted to some subset of NVEs, and they could all share the same
outer IP multicast group address. This, however, introduces a
problem of suboptimal delivery (even if a particular tenant within
the group of tenants doesn’t have a presence on one of the NVEs that
another one does, the multicast packets would still be delivered to
that NVE). It also introduces an additional network management
burden to optimize which tenants should be part of the same tenant
group (based on the NVEs they share), which somewhat dilutes the
value proposition of NVO3 (to completely decouple the overlay and
physical network design allowing complete freedom of placement of VMs
anywhere within the DC).
There are still other mechanisms that may be used that attempt to
combine some of the advantages of the above methods by offering
multiple replication points, each with a limited degree of
replication [EDGE-REP]. Such schemes offer a trade-off between the
amount of replication at an intermediate node (e.g., router) versus
performing all of the replication at the source NVE or all of the
replication at a multicast service node.
5. Other Issues
For DCs with virtualized servers, VMs can be added, deleted, or moved
very easily. When VMs are added, deleted, or moved, the NVEs to
which the VMs are attached are changed.
6. Security Considerations
7. IANA Considerations
8. Summary
9. References
[RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and
A. Thyagarajan, "Internet Group Management Protocol,
Version 3", RFC 3376, DOI 10.17487/RFC3376, October 2002,
<https://www.rfc-editor.org/info/rfc3376>.
[RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
Kreeger, L., and M. Napierala, "Problem Statement:
Overlays for Network Virtualization", RFC 7364,
DOI 10.17487/RFC7364, October 2014,
<https://www.rfc-editor.org/info/rfc7364>.
[RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and
Y. Rekhter, "Framework for Data Center (DC) Network
Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
2014, <https://www.rfc-editor.org/info/rfc7365>.
[RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and
T. Narten, "An Architecture for Data-Center Network
Virtualization over Layer 3 (NVO3)", RFC 8014,
DOI 10.17487/RFC8014, December 2016,
<https://www.rfc-editor.org/info/rfc8014>.
[RFC3819] Karn, P., Ed., Bormann, C., Fairhurst, G., Grossman, D.,
Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and
L. Wood, "Advice for Internet Subnetwork Designers",
BCP 89, RFC 3819, DOI 10.17487/RFC3819, July 2004,
<https://www.rfc-editor.org/info/rfc3819>.
[RFC6831] Farinacci, D., Meyer, D., Zwiebel, J., and S. Venaas, "The
Locator/ID Separation Protocol (LISP) for Multicast
Environments", RFC 6831, DOI 10.17487/RFC6831, January
2013, <https://www.rfc-editor.org/info/rfc6831>.
[RFC7117] Aggarwal, R., Ed., Kamite, Y., Fang, L., Rekhter, Y., and
C. Kodeboniya, "Multicast in Virtual Private LAN Service
(VPLS)", RFC 7117, DOI 10.17487/RFC7117, February 2014,
<https://www.rfc-editor.org/info/rfc7117>.
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
eXtensible Local Area Network (VXLAN): A Framework for
Overlaying Virtualized Layer 2 Networks over Layer 3
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
<https://www.rfc-editor.org/info/rfc7348>.
[EDGE-REP] Marques, P., Fang, L., Winkworth, D., Cai, Y., and
P. Lapukhov, "Edge multicast replication for BGP IP
VPNs.", Work in Progress, draft-marques-l3vpn-
mcast-edge-01, June 2012.
[ISIS-Multicast]
Yong, L., Weiguo, H., Eastlake, D., Qu, A., Hudson, J.,
and U. Chunduri, "IS-IS Protocol Extension For Building
Distribution Trees", Work in Progress,
draft-yong-isis-ext-4-distribution-tree-03, October 2014.
[LANE] ATM Forum, "LAN Emulation Over ATM: Version 1.0", ATM
Forum Technical Committee, af-lane-0021.000, January 1995.
[LISP-Signal-Free]
Moreno, V. and D. Farinacci, "Signal-Free LISP Multicast",
Work in Progress, draft-ietf-lisp-signal-free-
multicast-07, November 2017.
[VXLAN-GPE]
Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol
Extension for VXLAN", Work in Progress,
draft-ietf-nvo3-vxlan-gpe-05, October 2017.
Acknowledgments
Many thanks are due to Dino Farinacci, Erik Nordmark, Lucy Yong,
Nicolas Bouliane, Saumya Dikshit, Joe Touch, Olufemi Komolafe, and
Matthew Bocci for their valuable comments and suggestions.
Authors’ Addresses
Anoop Ghanwani
Dell
Email: anoop@alumni.duke.edu
Linda Dunbar
Huawei Technologies
5340 Legacy Drive, Suite 1750
Plano, TX 75024
United States of America
Mike McBride
Huawei Technologies
Email: mmcbride7@gmail.com
Vinay Bannai
Google
Email: vbannai@gmail.com
Ram Krishnan
Dell
Email: ramkri123@gmail.com