Configures a recovery option for instances where an exception occurs on the specified MSM/MM or I/O module.
all | Specifies all slots of the MSM/MM and I/O module. |
slot_number | Specifies the slot of the
MSM/MM or I/O module.
|
none | Configures the MSM/MM or I/O module to maintain its current state regardless of the detected hardware fault. The offending MSM/MM or I/O module is not reset. For more information about the states of an MSM/MM or I/O module see the show slot command. |
reset | Configures the offending MSM/MM or I/O module to reset upon a hardware fault detection. For more detailed information, see the Usage Guidelines. |
shutdown | Configures the switch to shut down all slots/modules configured for shutdown upon fault detection. On the modules configured for shutdown, all ports in the slot are taken offline in response to the reported errors; however, the MSMs/MMs remain operational for debugging purposes only. ExtremeXOS logs fault, error, system reset, system reboot, and system shutdown messages to the syslog. |
The default setting is reset.
Use this command for system auto-recovery upon detection of hardware problems. You can configure the MSMs/MMs or I/O modules installed in a modular switch to take no action, automatically reset, shutdown, or if dual MSMs/MMs are installed, failover to the other MSM/MM, if the switch detects a faulty MSM/MM or I/O module. This enhanced level of recovery detects faults in the ASICs as well as packet buses.
You must specify one of the following parameters for the system to respond to MSM/MM or I/O module failures:
Depending on your configuration, the switch resets the offending MSM/MM or I/O module if fault detection occurs. An offending MSM/MM is reset any number of times, and the MSM/MM is not permanently taken offline. On the BlackDiamond 8800 series switches, an offending I/O module is reset a maximum of five times. After the maximum number of resets, the I/O module is permanently taken offline.
If you configure the hardware recovery setting to either none (ignore) or shutdown, the switch prompts you to confirm this action. The following is a sample shutdown message:
Are you sure you want to shutdown on errors? (y/n)
Enter y to confirm this action and configure the hardware recovery level. Enter n or press [Enter] to cancel this action.
Beginning with ExtremeXOS 11.5, you can configure the switch to shut down one or more modules upon fault detection by specifying the shutdown option. If you configure one or more slots to shut down and the switch detects a hardware fault, all ports in all of the configured shut down slots are taken offline in response to the reported errors. (MSMs are available for debugging purposes only.)
The affected module remains in the shutdown state across additional reboots or power cycles until you explicitly clear the shutdown state. If a module enters the shutdown state, the module actually reboots and the show slot command displays the state of the slot as Initialized; however, the ports are shut down and taken offline. For more information about clearing the shutdown state, see the clear sys-recovery-level command.
The following table describes the actions module recovery takes based on your module recovery setting. For example, if you configure a module recovery setting of reset for an I/O module, the module is reset a maximum of five times before it is taken permanently offline.
Module Recovery Setting | Hardware | Action Taken |
---|---|---|
none | ||
Single MSM |
The MSM remains powered on in its current state. This does not guarantee that the module remains operational; however, the switch does not reboot the module. |
|
Dual MSM |
The MSM remains powered on in its current state. This does not guarantee that the module remains operational; however, the switch does not reboot the module. |
|
I/O Module |
The I/O module remains powered on in its current state. The switch sends error messages to the log and notifies you that the errors are ignored. This does not guarantee that the module remains operational; however, the switch does not reboot the module. |
|
reset | ||
Single MSM | Resets the MSM. | |
Dual MSM | Resets the primary MSM and fails over to the backup MSM. | |
I/O Module | Resets the I/O module a maximum of five times. After the fifth time, the I/O module is permanently taken offline. | |
shutdown | ||
Single MSM | The MSM is available for
debugging purposes only (the I/O ports also go down); however, you
must clear the shutdown state using the clear
sys-recovery-level command for the MSM to become
operational. After you clear the shutdown state, you must reboot the switch. For more information see the clear sys-recovery-level command. |
|
Dual MSM |
The MSM is available for debugging purposes only (the I/O ports also go down); however, you must clear the shutdown state using the clear sys-recovery-level command for the MSM to become operational. After you clear the shutdown state, you must reboot the switch. For more information see the clear sys-recovery-level command. |
|
I/O Module |
Reboots the I/O module. When the module comes up, the ports remain inactive because you must clear the shutdown state using the clear sys-recovery-level command for the I/O module to become operational. After you clear the shutdown state, you must reset each affected I/O module or reboot the switch. For more information see the clear sys-recovery-level command. |
To display the module recovery setting, use the show slot command.
Beginning with ExtremeXOS 11.5, the show slot output has been modified to include the shutdown configuration. If you configure the module recovery setting to shutdown, the output displays an “E” flag that indicates any errors detected on the slot disables all ports on the slot. The “E” flag appears only if you configure the module recovery setting to shutdown.
Note
If you configure one or more slots for shut down and the switch detects a hardware fault on one of those slots, all of the configured slots enter the shutdown state and remain in that state until explicitly cleared.If you configure the module recovery setting to none, the output displays an "e" flag that indicates no corrective actions will occur for the specified MSM/MM or I/O module. The "e" flag appears only if you configure the module recovery setting to none.
The following sample output displays the module recovery action. In this example, notice the flags identified for slot 2:
Slots Type Configured State Ports Flags ------------------------------------------------------------------------------- Slot-1 8900-G96T-c 8900-G96T-c Operational 96 MB Slot-2 8900-10G24X-c 8900-10G24X-c Operational 24 MB E Slot-3 8900-40G6X-xm 8900-40G6X-xm Operational 24 MB Slot-4 G48Xc G48Xc Operational 48 MB Slot-5 G8Xc G8Xc Operational 8 MB Slot-6 Empty 0 Slot-7 G48Te2(PoE) G48Te2(PoE) Operational 48 MB Slot-8 G48Tc G48Tc Operational 48 MB Slot-9 10G4Xc 10G4Xc Operational 4 MB Slot-10 Empty 0 MSM-A 8900-MSM128 Operational 0 MSM-B 8900-MSM128 Operational 0 Flags : M - Backplane link to Master is Active B - Backplane link to Backup is also Active D - Slot Disabled I - Insufficient Power (refer to "show power budget") e - Errors on slot will be ignored (no corrective action initiated) E - Errors on slot will disable all ports on slot
Note
In ExtremeXOS 11.4 and earlier, if you configure the module recovery setting to none, the output displays an "E" flag that indicates no corrective actions will occur for the specified MSM or I/O module. The "E" flag appears only if you configure the module recovery setting to none.To display the module recovery setting for a specific port on a module, including the current recovery mode, use the following command: show slot slot
In addition to the information displayed with show slot, this command displays the module recovery setting configured on the slot. The following truncated output displays the module recovery setting (displayed as Recovery Mode) for the specified slot:
Slot-2 information: State: Operational Download %: 100 Flags: MB E Restart count: 0 (limit 5) Serial number: 800264-00-01 0907G-00166 Hw Module Type: 8900-10G24X-c SW Version: 15.2.0.26 SW Build: v1520b26 Configured Type: 8900-10G24X-c Ports available: 24 Recovery Mode: Shutdown Debug Data: Peer=Operational Flags : M - Backplane link to Master is Active B - Backplane link to Backup is also Active D - Slot Disabled I - Insufficient Power (refer to "show power budget") e - Errors on slot will be ignored (no corrective action initiated) E - Errors on slot will disable all ports on slot
If you experience an MSM/MM failure, please contact Extreme Networks Technical Support.
The following example configures a switch to not take an action if a hardware fault occurs:
configure sys-recovery-level slot none
This command was first available in ExtremeXOS 11.3.
The shutdown parameter was added in ExtremeXOS 11.5.
This command is available on all platforms.