Upgrade Device Firmware in a High Availability Deployment

Use this procedure to upgrade the firmware of SLX devices that host the high availability deployment on TPVMs.

This is the recommended method for upgrading the firmware of devices in a high-availability deployment. It describes how to upgrade the standby node, force a failover to change the active node to standby, and then upgrade the new standby node.

To upgrade firmware in all devices in the fabric, see Fabric-wide Firmware Download.

  1. On the TPVM, determine which EFA node is the standby node.
    1. Run the EFA efactl status script (or the efa status command, as an alternative).
      $ efactl status
      
      NAME    STATUS   ROLES    AGE   VERSION        LABELS
      tpvm    Ready    primary   21h   v1.18.6+k3s1   beta.kubernetes.io/arch=amd64,
      beta.kubernetes.io/os=linux,keepalived=active,kubernetes.io/arch=amd64,
      kubernetes.io/hostname=tpvm,kubernetes.io/os=linux,node-role.kubernetes.io/primary=true
      
      tpvm2   Ready    primary   21h   v1.18.6+k3s1   beta.kubernetes.io/arch=amd64,
      beta.kubernetes.io/os=linux,keepalived=standby,kubernetes.io/arch=amd64,
      kubernetes.io/hostname=tpvm2,kubernetes.io/os=linux,node-role.kubernetes.io/primary=true
    The node that is labeled "keepalived=standby" is the standby node. The command also returns a list of EFA pods (not shown in the example). The node that runs the pods is the active node.
  2. Prepare and run the firmware download on the device that hosts the standby node.
    1. Prepare the firmware download.
      efa inventory device firmware-download prepare add --ip <device IP> 
      --firmware-host <IP of firmware download host> -
      -firmware-directory <path to target firmware build>
      
      The command returns the following information in a table: IP address, host name, model, chassis name, ASN, role, current firmware, firware host, firmware directory, target firmware, and last update time.
    2. Download the firmware.
      $ efa inventory device firmware-download execute --fabric <fabric name>
      Firmware Download  [success]
    3. Monitor the progress of the firmware download.
      $ efa inventory device firmware-download show --fabric <fabric name>
      
      Don't run other commands on these devices until firmware download is in progress 
      --- Time Elapsed: 299.843244ms --
    4. Repeat step c until the firmware download is complete.
      Each time you repeat step c, the command returns a table that details the progress of the firmware download. The download is complete when the Update State column shows Completed and the Status column shows Firmware Committed.
  3. Perform a high availability failover.
    1. On the device that hosts the active node, stop and start the TPVM to initiate a failover.
      device# tpvm stop
      stop succeeds
      
      device# tpvm start
      start succeeds
      After failover, the active node becomes the standby node.
    2. On the TPVM, validate that the failover is complete.
      $ efactl status
      or
      $ efa status
      The command returns a list of all pods and their metadata, including status, number of restarts, age, IP address, and node name.
  4. Repeat steps 2a through 2d to upgrade the device firmware on the TPVM of the new standby node.
Note

Note

If you do not follow the recommended steps and, instead, upgrade the firmware on the active node, then the EFA inventory becomes out of sync with the SLX device. The device remains in maintenance mode and the inventory indicates that the firmware download is in progress (even if it completed successfully). To resynchronize the inventory and the device, run the following commands to correct the device's firmware state in EFA and then to take the device out of maintenance mode.
$ efa inventory device update --ip <device IP>

$ efa inventory device setting update --maint-mode-enable No --ip <device IP>