Wednesday, August 1, 2012

The Myth of the Broadcast Domain Apocalypse

There is a train of thought that is popular with some network vendors and long time network engineers that there is a compelling need to “orchestrate” the physical network and the visualized-networking world.  This is expressed as a desire to ensure that VLANs are synchronized between the physical network switches and the virtual switches on the visualization hosts.

The Orchestrated VLAN Model
VMware has a concept of a “backing VLAN”. This simply put means that traffic belonging to a portgroup uses a configured transport VLAN when it traverses the physical network.  For example, if a group of VMs belong to a portgroup “backed” by a VLAN, that VLAN must be “allowed” on the trunk ports connecting all of the physical ESX hosts that are hosting any of those VMs.

In addition to this, it is argued that the trunk ports to ESX hosts that do not host a VM belonging to that portgroup should not “allow” the presence of the VLAN that backs that portgroup.  It is suggested that the reason for this requirement is that the unbridled propagation of VLANs will cause the ESX host to process broadcast packets it does not need to, with potentially dire consequences.

esx1diag.png

Imagine that ESX1 has VMs belonging to a portgroup, backed by VLAN-A and another portgroup backed by VLAN-B.  VM1 and VM2 are part of the first portgroup and VM3 is a member of the second portgroup.  The trunk port between the physical network and the ESX1 host is configured to have only the A and B VLANs present on them.  The other ESX hosts similarly have their VMs, portgroups and VLANs matched.

The advantage of this scheme is that if there is a broadcast packet on a portgroup, only the VMs in the portgroup, and more importantly, the vmkernels that belong to the ESX hosts with VMs on that portgroup need to process the broadcast. 

The Set and Forget Model
There is an alternate model of operation which suggests that it would be simpler to enable each trunk port to virtualisation hosts with all possible VLANs and not concern ourselves with network orchestration as such, just allow the virtualisation management platform to use the VLANs at will.

So in this example we would simply configure the trunk ports to ESX1, ESX2 and ESX3 with all the VLANs that are intended for port group backings.  This is a one step configuration, particularly so with a system like Juniper’s QFabric Switching Platform, which presents it's self as a single switch.

We can call this the “set and forget network orchestration” model.  The disadvantage of this model is that there is less broadcast radiation control and it is suggested by opponents of this approach that broadcasts may overwhelm the CPUs of the ESX hosts. We now have a situation that the vmkernel on ESX3 will process the broadcast packet even though it has no VM in interested in it.  This will cause an amount of consumption of the CPU of ESX3 even though it has no use for the packet.

A Simplified Model
A key advantage of this model is that no network orchestration is required and so no added platform is required to achieve it.  Another advantage is that this is a more reliable approach as a configuration is done on the network once and never touched again.

Given that the “set and forget” model is clearly easier to manage, requires fewer management elements or “orchestration systems” and has no additional points of failure, the preference for the “orchestrated” model must be built on the danger presented by supposed unfettered VLAN propagation.

There may be other advantages to the “orchestration method” of course such as separation of management powers between the server managers and the network ones but these are beyond the scope of this post.  The main argument for not using the “set and forget” model is almost always the dangers of lack of VLAN / portgroup synchronisation.

The Dangers of Broadcast Radiation
The magnitude of the danger presented by unrestrained broadcast radiation is the subject of much angst from experienced network professionals and the need to control its seemingly inarguable danger is spoken of as if it is a fundamental principle of network design.

The thinking about what may or may not be a reasonable amount of broadcast traffic for a single host to handle has been fairly consistent for many years and ingrained in our thinking.  However, I’m not so sure that the years of development of the host platforms have been taken into account in determining its current danger.

Modern Technology Advances 
The question is whether the broadcast domain guidelines of the past are still relevant for modern machines or even when directly relevant to virtualised environments. It could be argued that the dangers of the “Set and Forget” model due to massive broadcast domains in virtualised environments are overstated. This is so for the following reasons:

Unintentional broadcast traffic is well constrained with loop free topologies and storm control mechanisms at the edge of the physical network and with bandwidth control measures in the virtual distributed switch CPU capabilities today are thousands of times greater than when we devised our initial broadcast domain “rules of thumb”

Modern CPU architectures are devised with the role of virtual network edge in mind. Intel achieved 20G line rate routed throughput in software a few years ago with CPU capacity to spare. Servers tend to broadcast less traffic than the typical VLAN members of the past did.

While VLANs do create a kind of “aggregated bridge domain” in a virtualised environment the characteristics of this are not identical to a true single broadcast domain.

VMware limits place maximum host numbers in regards to the “aggregated bridge domain” that would make typical broadcast traffic amounts negligible to modern days processors.

The QFabric Simplifies the Network
The “Set and Forget” model offers a simpler platform with less management elements than an “orchestrated” model and along with the QFabric “single switch” approach could provide a compelling alternative to overly complicated networks and their associated orchestration systems.  In this model the orchestration solution is “one switch, configure it once”, a very powerful option founded on simplicity.

I hope you found this topic interesting. To continue the conversation you could meet with one of our network experts such as Russel Skinglsey, who formulated and researched this concept. For more information see the QFabric page, link.

Qfabric.png

This blog post first appeared on my Juniper blog, see link.

No comments:

Post a Comment