Understanding Hitless Failover Support

With Modular Switches and SummitStack the term hitless failover has slightly different meanings on a modular chassis and a SummitStack.

On a modular chassis, MSMs/MMs do not directly control customer ports; such ports are directly controlled by separate processors. However, a SummitStack node has customer ports that are under the control of its single central processor. When a modular chassis MSM/MM failover occurs, all of the ports in the chassis are under the control of separate processors which can communicate with the backup MSM/MM, so all ports continue to function. In a SummitStack, failure of the primary node results in all ports that require that node's processor for normal operation going down. The remaining SummitStack nodes' ports continue to function normally. Aside from this difference, hitless failover is the same on modular chassis and SummitStack.

As described in the section Understanding System Redundancy, if you install two MSMs/MMs (nodes) in a chassis or if you configure at least two master-capable nodes in a SummitStack, one assumes the role of primary and the other assumes the role of backup.

The primary node provides all of the switch management functions including bringing up and programming the I/O modules or other (standby) nodes in the SummitStack, running the bridging and routing protocols, and configuring the switch. The primary node also synchronizes the backup node in case it needs to take over the management functions if the primary node fails.

The configuration is one of the most important pieces of information checkpointed to the backup node. Each component of the system needs to checkpoint whatever runtime data is necessary to allow the backup node to take over as the primary node if a failover occurs, including the protocols and the hardware-dependent layers. For more information about checkpointing data and relaying configuration information, see Replicating Data Between Nodes.

Not all protocols support hitless failover; see the following table for a detailed list of protocols and their support. Layer 3 forwarding tables are maintained for pre-existing flows, but subsequent behavior depends on the routing protocols used. Static Layer 3 configurations and routes are hitless. You must configure OSPF graceful restart for OSPF routes to be maintained, and you must configure BGP graceful restart for BGP routes to be maintained. For more information about OSPF, see OSPF and for more information about BGP, see BGP. For routing protocols that do not support hitless failover, the new primary node removes and re-adds the routes.