Upgrade Device Firmware in a High Availability Deployment

Use this procedure to upgrade the firmware of SLX devices that host the high availability deployment on TPVMs.

This is the recommended method for upgrading the firmware of high availability devices. It describes how to upgrade the standby node, force a failover to change the active node to standby, and then upgrade the new standby node.

  1. On the TPVM, determine which EFA node is the standby node.
    $ efactl status
    
    NAME    STATUS   ROLES    AGE   VERSION        LABELS
    tpvm    Ready    master   21h   v1.18.6+k3s1   beta.kubernetes.io/arch=amd64,
    beta.kubernetes.io/os=linux,keepalived=active,kubernetes.io/arch=amd64,
    kubernetes.io/hostname=tpvm,kubernetes.io/os=linux,node-role.kubernetes.io/master=true
    
    tpvm2   Ready    master   21h   v1.18.6+k3s1   beta.kubernetes.io/arch=amd64,
    beta.kubernetes.io/os=linux,keepalived=standby,kubernetes.io/arch=amd64,
    kubernetes.io/hostname=tpvm2,kubernetes.io/os=linux,node-role.kubernetes.io/master=true
    The node that is labeled "keepalived=standby" is the standby node. The command also returns a list of EFA pods (not shown in the example). The node that runs the pods is the active node.
  2. Prepare and run the firmware download on the device that hosts the standby node.
    1. Prepare the firmware download.
      efa inventory device firmware-download prepare add --ip <device IP> 
      --firmware-host <IP of firmware download host> -
      -firmware-directory <path to target firmware build>
      
      The command returns the following information in a table: IP address, host name, model, chassis name, ASN, role, current firmware, firware host, firmware directory, target firmware, and last update time.
    2. Download the firmware.
      $ efa inventory device firmware-download execute --fabric <fabric name>
      Firmware Download Execute [success]
    3. Monitor the progress of the firmware download.
      $ efa inventory device firmware-download show --fabric <fabric name>
      
      Don't execute other commands on these devices until firmware download is in progress 
      --- Time Elapsed: 299.843244ms --
    4. Repeat step c until the firmware download is complete.
      Each time you repeat step c, the command returns a table that details the progress of the firmware download. The download is complete when the Update State column shows Completed and the Status column shows Firmware Committed.
  3. Perform a high availability failover.
    1. On the device that hosts the active node, stop and start the TPVM to initiate a failover.
      device# tpvm stop
      stop succeeds
      
      device# tpvm start
      start succeeds
      After failover, the active node becomes the standby node.
    2. On the TPVM, validate that the failover is complete.
      $efactl status
      The command returns a list of all pods and their metadata, including status, number of restarts, age, IP address, and node name.
  4. Repeat steps 2a through 2d to upgrade the device firmware on the TPVM of the new standby node.
Note

Note

If you do not follow the recommended steps and, instead, upgrade the firmware on the active node, then the EFA inventory becomes out of sync with the SLX device. The device remains in maintenance mode and the inventory indicates that the firmware download is in progress (even if it completed successfully). To resynchronize the inventory and the device, run the following commands to correct the device's firmware state in EFA and then to take the device out of maintenance mode.
$ efa inventory device update --ip <device IP>

$ efa inventory device setting update --maint-mode-enable No --ip <device IP>