Ethernet Bridging

1. Transparent Bridging

A Bridge connects distinct segments allowing traffic to pass between the segments. The maximum size of the network can be extended both in distance, repeater and station count. A bridge can be used to split up a large segment into two smaller ones. The benefit of this is that there are less chances for collisions on the smaller segments leaving more bandwidth for real data. There are some protocols such as Local Area Transport (LAT), Maintenance Operation Protocol (MOP) and native NetBIOS that do not have any network layer addressing and so these protocols cannot be routed across a large network, they have to be bridged. There are implementations of NetBIOS that run over IP and IPX thereby allowing NetBIOS to be routed in that way.

A Transparent or Learning Bridge learns the MAC addresses of the stations on all the attached segments since it receives and examines every frame transmitted on the attached networks. This is called Promiscuous Mode and the ports of the bridge operate in Learning Mode as they build the Forwarding Tables. This source address list is called the Forwarding Table. If a frame arrives at a bridge port destined for a station connected to the same port, then the bridge does not forward that frame out of any port. If the frame's destination address is held in the forwarding table then the bridge knows on which port the destination device is connected and the bridge forwards the frame out of that port only. If the destination is unknown, then the bridge forwards the frame out of all the ports, that is it Floods the frame out of all ports except the one that the frame arrived on. This whole process is known as Transparent Bridging because the bridges are 'transparent' to the stations, they just see one large segment, the bridge's MAC address is invisible to the frames and the frames are never altered in any way.

These forwarding tables can grow very large so some manufacturers apply a process called Aging whereby the oldest addresses (typically 300 seconds) are removed to free up memory.

Transparent bridges connect LANs that use the same protocols at the physical and data link layer and they do all the work with regards to tracking which station sits on which network. There is no route discovery process or route selection process with transparent bridges.

A problem with the bridge is that it adds 20 to 30% latency in a network for acknowledgement-oriented protocols or 10 to 20% for sliding window type protocols, plus it relies on redistributing broadcasts and multicasts from a particular segment to all other segments (since the source address for broadcasts does not exist and is not in the forwarding table). These segments therefore see more broadcasts than they would if the segments were totally isolated. There is therefore a greater risk of broadcast storms occurring.

A Remote Bridge has a LAN interface on one side and a WAN interface on the other. Another remote bridge at the other end of the WAN completes the connection. This allows WAN connections without having to use layer 3 devices (routers).

2. Spanning Tree (802.1d)

Spanning Tree applies to both bridged networks and switched networks, so bear this in mind as a switched network is basically made up of multiple LAN segments (Collision Domains). Consider the following network:

2.1 Loop Problem

In the above setup (without Spanning Tree), there is a possibility that a bridging loop can occur as Sonny tries to talk to Jim for the first time. The following events occur in the communication process:

A frame from Sonny arrives at interface X on both Bridge A and B. They both learn Sonny's MAC address and add it to their forwarding tables.
Neither bridge yet knows about Jim so Bridge A floods the frame out of interface Y and Bridge B floods the interface Y. These bridges would also flood the frame out of other interfaces if they had any.
Bridge C receives the frame on interface Y and creates an entry for Sonny in its forwarding table to point to interface Y.
Bridge C does not yet know about Jim so it floods the frame out on interface X.
Bridge C also learns about Sonny from interface X since it learnt this from Bridge A.
Jim receives the same frame from both Bridge B and Bridge C.
When Jim sends a frame to Sonny, Bridge B knows to send anything to Sonny via it's interface X and Bridge C knows to send anything to Sonny out on it's interface X, oh dear! What about interface Y?
Oops! A loop has occurred! Bridge C will keep learning about Sonny on both interfaces X and Y, sometimes X, sometimes Y, depending on the latency of the LANs in between. This continual flooding results in broadcast storms which create an unneccesary heavy load on the bridges or switches and the network becomes extremely sluggish, if it is operating at all.

This sequence of events is based on the operation of Transparent Bridging i.e. broadcasts and unknown unicasts are forwarded out of all ports, bar the incoming port.

2.2 Loop Solution

To get around this problem, within Spanning Tree, only one bridge/switch is allowed to be the Root Bridge. All traffic for the whole Spanning Tree network of bridged/switched LANS goes through the root bridge. In addition, only one link is allowed between devices, and on each LAN segment, only one device becomes the Designated Bridge, through which ALL that particular LANs traffic must go. The other links remain dormant. The designated bridge has a port that leads to the root bridge of the bridged network of multiple LAN segments. The designated bridge is determined by the bridge with the smallest path cost to the root bridge. The path cost is determined by the cumulative port costs on the path to the root bridge. A 100Mbps port has a cost of 19 for example.

There are three main steps that Spanning Tree takes when building the loop free Spanning Tree network:

Root bridge election for the whole network.
Root port election on each bridge/switch.
Designated port election on each LAN segment.

2.3 Spanning Tree Initialisation

On initial setup, all participating bridges declare themselves to be the root, they then exchange Hello (or Configuration) Bridge Protocol Data Units (BPDU) containing specific bridge information. The BPDUs are sent to the well-known multicast address 0x0180c2000000. There are two types of BPDUs, Configuration BPDUs used for bridge and port elections, and there are Topology Change Notification (TCN) BPDUs which are sent back up towards the root when there is a topology change. BPDUs are sent to a well-known multicast address.

2.4 Bridge Protocol Data Unit (BPDU)

These BPDUs are in the following format:

Protocol ID - indicates that the packet is a BPDU.
Version - the version of the BPDU being used.
Message Type - the stage of the negotiation.
Flags - two bits are used to indicate a change in topology and to indicate acknowledgement of the TCN BPDU.
Root ID - the root bridge priority (2 bytes) followed by the MAC address (6 bytes).
Root Path Cost - the total cost to from this particular bridge to the designated root bridge.
Bridge ID - the bridge priority (2 bytes) followed by the MAC address (6 bytes), lowest value wins! The default bridge priority is 0x8000 (32768₁₀).
Port ID - the ID of the port from which are transmitted the BPDUs, a root port, this is made up of the configured port priority and the bridge MAC address.
Message Age - timers for aging messages (only has effect on the network if the root bridge is configured with this parameter).
Maximum Age - the maximum message age before information from a BPDU is dropped because it is too old and no more BPDUs have been received. (only has effect on the network if the root bridge is configured with this parameter). The default value for this is 20 seconds.
Hello Time - the time between BPDU configuration messages sent by the root bridge (only has effect on the network if the root bridge is configured with this parameter). The default value for this is 2 seconds.
Forward Delay - this temporarily stops a bridge from forwarding data to give a chance for information of a topology change to filter through to all parts of the network. This means that ports that need to be turned off in the new topology have a chance to be switched off before the new ports are turned on (only has effect on the network if the root bridge is configured with this parameter). The default value for this is 15 seconds.

The timers are based on the assumption that the diameter of the network is no more than 7 bridges/switches away from the root (including the root bridge).

The following port costs are the current IEEE defaults:

10Gbps : 1
1Gbps : 4
622Mbps : 6
155Mbps : 14
100Mbps : 19
45Mbps : 39
16Mbps : 62
10Mbps : 100
4Mbps : 250

2.5 Spanning Tree Operation

Whilst the BPDUs move between links on the network, the bridge ports are in Listening Mode as the bridges listen to the BPDUs and decide which is the root bridge, and which bridges are the designated bridges. When a bridge sees a BPDU from a bridge with a lower Bridge ID it stops sending its own BPDUs. Eventually, one of the bridges is determined as the Root Bridge, it is the one with the lowest ID value (the bridge priority is user-defined and is looked at first, followed by the MAC address).

Once the the root bridge has been determined, all the other bridges work out a least cost path to the root bridge. The root bridge sends a BPDU with an initial value for the Root Path Cost of zero. Each bridge receives the BPDU and adds the cost of the particular interface on which the BPDU was received to the Root Path Cost and sends its own generated BPDU out of all its interfaces. Fast Ethernet ports default to a port cost of 19. The cumulative Root Path Cost is used to determine which interface a particular bridge received the lowest cost BPDU and this interface becomes the Root Port.

Then the other bridges designated to be part of the tree, determine whether any ports are not root ports. These non-designated ports are then placed into the following conditions in order:

Blocking Mode - where no frames are forwarded and just Hello BPDUs are listened to, this lasts for 20 seconds, which is the Maximum Age Time.
Then ports change from Blocking Mode to Listening Mode where they remain for 15 seconds (this is called the Forward Delay Time), where no data frames are forwarded, they are just listened to, BPDUs however, are both received AND sent.
Then the ports change to Learning Mode for another 15 seconds (again using the Forward Delay time), where still no data frames are forwarded, and addresses are being learned as the bridge/switch builds the forwarding tables.
Finally, the designated ports move to Forwarding Mode, where data frames are forwarded now as well as addresses still being learned.

After this process, if a port fails, then it is placed into a Disabled State.

From this sequence, it can be seen that the worst case scenario is if a link is up but it fails to see any more BPDUs, there will be a 20 second wait before the information from the last BPDU ages out. At this point, the port goes into listening mode for 15 seconds and then learning mode for 15 seconds before it starts forwarding data frames again. This adds up to 50 seconds with the default timers.

2.6 Using Spanning Tree

In the case of the above diagram, Bridge A has the lowest ID so it becomes the root bridge. The cost from Sonny to Jim is 4 via Bridge A and 5 via Bridges B and C. Bridge C is taken out of the tree leaving Bridges A and B to be the designated bridges. If Bridge A or B failed, then Bridge C would come back on line and the Spanning Tree re-shaped.

A good network design should ensure that the root bridge is as close to the centre of the network as possible in order to facilitate faster convergence and also to make sure that the network is not running sub-optimally where traffic is taking less direct routes thereby doubling LAN bandwidth usage. Rather than rely on Spanning Tree using the default settings, it is often a good idea to shape the tree by setting bridge IDs and port costs.

Default timers are advertised by the root bridge, the default for the Forward Delay Timer is usually 15 seconds which gives enough time for the transition stage to complete before traffic is forwarded. On some vendor's equipment you can come across a mechanism called Port Fast (LAN traffic) or Uplink-Fast (Dial-up traffic) that allows you to bypass the Forward Delay Timer and forward data immediately rather than wait for a BPDU and all the delays between the modes mentioned earlier. This is useful only on point-to-point links such as trunk ports acting in a redundant manner in Spanning Tree where you know that you are in no danger of creating a loop.

2.7 Steady State Operation

Configuration BPDUs are sent regularly by the root bridge along with the age of the message. These are resent by each designated bridge in the Spanning Tree. The receiving bridges store this limited age. If the information times out for a particular port, then the bridge will try to become the designated bridge for that LAN. If root bridge information times out due to a better root port being found, or a root bridge failure then the bridge tries to become a root bridge itself and the election process restarts.

If a designated bridge (a bridge with active ports pointing to the root bridge) fails, or is removed from the network, or the root port fails, a directly connected bridge in the LAN segment detects that it is not receiving Configuration BPDUs from that bridge. This is because the information from the last BPDU times out according to the Maximum Age timer (default 20 seconds). It then sends a TCN BPDU to its designated bridge/switch that is destined for the root bridge/switch. The designated bridge receiving this TCN BPDU sends back a Configuration BPDU containing an acknowledgement as well as sending another TCN BPDU on towards the root bridge. The root bridge, on receipt of the TCN BPDU, sends a modified Configuration BPDU to all bridges in the network indicating that a toplogy change has occurred by setting the Topology Change Flag. Any directly connected bridges on the same segment, receiving the configuration change BPDU, create their own BPDUs and send those out. They also age out their forwarding tables according to the Forward Delay timer (15 seconds) rather than use the default time of 300 seconds. This fast aging lasts until the root bridge resets the Topology Change Flag once enough time has elapsed for the configuration change notification to have propagated throughout the tree. This time is determined by the formula Max Age + Forward Delay which equals 20 + 15 = 35 seconds by default. This speedy aging out is important as a topology change could mean that a particular device is learned via another port now and we want to avoid sending data traffic into a black hole whilst the normal aging time of 300 seconds times out.

It is quite normal for TCNs to occur in a network e.g. ports change state as users switch machines off. TCNs DO NOT in themselves cause a recalculation of Spanning Tree, although they could be a symptom of STP recalculation. Spanning Tree recalculation occurs when priorities are administratively changed, or configuration BPDUs fail to reach a Designated Bridge i.e. If a network segment in the spanning tree fails and a redundant path exists. So this will happen if any bridge/switch is added or removed or parameter changes occur.

2.8 Port Fast

A feature on some manufacturers' equipment called Port Fast allows certain ports that are say connected to key devices such as servers to still play a part in Spanning Tree but rather than the port go through the process of waiting to go through the Spanning Tree states as described above, the port can jump straight from blocking mode to forwarding mode. This is only of use for ports connecting to end devices since you could end up creating loops if this 'port fast' technique was applied to trunk links to other switches. This is an important feature to prevent host machines and servers from timing out their network connections when they initially connect.

2.9 Uplink Fast

There is another technique called Uplink Fast that can be used on access switches that have multiple paths to the root bridge/switch. An uplink group is created containing the root port and all alternate ports. On a failure of the root port link frames are immediately forwarded on the alternate link and Spanning Tre converges in a second or two instead of 35 seconds. A problem with this is that the forwarding tables are temporarily incorrect thereby causing dropped frames. Cisco use a proprietary multicast to cause relevant switches to update their forwarding tables. Uplink fast is all very well but it applies to all VLANs rather than on a per VLAN basis so it is no good in environments where you want to load share traffic.

2.10 Issues with Spanning Tree

The trouble with Spanning Tree is that it necessitates the need for some links to remain dormant, thus wasting network bandwidth (e.g. an expensive serial link between two remote bridges). In addition, no load sharing can take place using the standard 802.1q VLAN specification although Cisco has a proprietary implementation of 802.1q which does allow Per VLAN Spanning Tree (PVST), plus its own Inter-Switch Link (ISL) which also allows PVST. The issues with PVST occur when connecting to non-Cisco devices running 802.1q trunking.

Spanning Tree allows for a maximum of 7 concatenated bridges (assuming 1 second for a default Hold Time that a bridge holds a frame before discarding it) A frame should be delivered no more than 7 seconds after initial transmission. This is important as the bridge timers need to be kept in synch at the extremities of the network, and there needs to be a limit to the accumulated forwarding delay between stations.

There are three types of Spanning Tree. The originally there was a version from DEC and one from IBM. The IBM version was the basis for the current standard form IEEE 802.1d. These different versions are not compatible with one another.

3. Translational Bridging

Where LANs using different protocols at the Physical and Data Link layers are to be connected, a Translational Bridge can be used. Examples include a bridge connecting an Ethernet network to a Token Ring network. Bridges cannot support messages of different lengths when converting between frame formats, so the end devices must be capable of configuring the messages to be of the same lengths. Nowadays you will only likely to be seeing this in an environment where FDDI is converted to Ethernet.

4. Encapsulated Bridge

It is common to see an environment where identical LAN environments are connected together via a dissimilar LAN environment. An example of this is two Ethernet LANs separated by a FDDI LAN or a Serial link. An encapsulated bridge does not change the frame headers as the Translational Bridge does, instead the central LAN protocol (e.g. FDDI) encapsulates the connected LAN transparently bridged frames (e.g. Ethernet) across its backbone and dismantles the encapsulation at the other end.

5. Concurrent Routing and Bridging (CRB)

Traditionally, if a protocol is configured on a router then if network layer addressing is available, the protocol is routed rather than bridged. If no network layer addressing is available then the protocol is bridged. In the past, at no time could a packet be both routed and bridged.

With Concurrent Routing and Bridging, one router can both route a protocol through one interface and bridge that same protocol through another interface at the same time, but not through the same interface. Routed protocols have to be routed out of routing interfaces and bridged protocols have to be bridged out of bridging interfaces.

6. Integrated Routing and Bridging (IRB)

This allows you to both route and bridge a given protocol through the same interface (this is an extension of Concurrent Routing and Bridging).

An example of its use is when you are migrating from a bridged network to a routed network where you want to connect bridged segments to routed networks. With IRB you can route between routed interfaces and bridge groups, or you can route between bridge groups. In addition, you can still bridge non-routable traffic between bridge interfaces within a bridge group.

You can conserve layer 3 logical addresses by assigning one layer 3 address to a bridge group and just bridge local traffic.

The implementation of IRB is made possible with the concept of the Bridge-Group Virtual Interface (BVI). The BVI acts as a routed interface that does not perform bridging but represents the bridge group that is used to send bridged traffic to a routed network. The BVI interface number is the bridge group number of the bridge group assigned to that interface. The BVI takes the MAC address of one of the bridged interfaces in the particular bridge group. The network layer address of the BVI must be in the same network as the routed hosts.

For host A to talk to host B, host A must have its default gateway set as the IP address of the BVI. The MAC address of the BVI is the bridged interface and is learned by host A via ARP. At this point, the destination MAC address of the packet is the BVI's MAC address and the source MAC address is that of host A.

The bridging software on the bridged interface looks at the packet and decides whether the packet is to be routed or bridged. If the destination MAC address is one of the router's interfaces (in this case it is, the BVI) and if the layer 3 protocol is configured on that interface, the bridging software makes the packet look like it has come from the BVI rather than the bridge group (10 in this case), the packet is sent to the routing engine and routed out of that interface.

If the packet is destined to a host without any layer 3 addressing and within the same bridge group as the BVI, then the bridging software sees that the destination MAC address is not the BVI but a host device. The packet is then bridged to the bridge group if the MAC address of the destination is known, or flooded out all bridge group interfaces if the destination MAC address is unknown.

Go next to Ethernet Switching.

Home

Disclaimer