Defects Closed with Code Changes

The following defects were closed in ExtremeCloud Orchestrator 3.4.0 and 3.4.1.

Defects Closed with Code Changes in ExtremeCloud Orchestrator 3.4.1

Parent Defect ID: XCO-9270 Issue ID: XCO-9270
Product: XCO Reported in Release: EFA 3.3.1
Symptom: Standby node showing 'down' after continuous node reboot
Condition: Manually triggering SLX reload where the active XCO node is running and check the node status after reboot (when it moves to standby mode).
Workaround: N/A
Recovery: Reboot both active and standby nodes to recover from the situation

Defects Closed with Code Changes in ExtremeCloud Orchestrator 3.4.0

Parent Defect ID: XCO-4127 Issue ID: XCO-4127
Product: XCO Reported in Release: EFA 3.0.0
Symptom: Ports are not listed in the port-channel creation for SLX NPB devices
Condition: Even though the ports are not used in any other configurations, the ports are not listed in the port-channel creation. For these ports, speed is set to auto-negotiation, and ports are not connected with cable.
Workaround: For breakout ports, make sure that cables are connected so that port speed will be updated.
Recovery: N/A
Parent Defect ID: XCO-4146 Issue ID: XCO-4146
Product: XCO Reported in Release: EFA 2.7.2
Symptom: The fabric devices continue to remain in cfg-refresh-err state after the tpvm fail over.
Condition:

1.Fabric devices are already in cfg-refresh-err state due to LLDP Link down(LD) event.

2. Bring up the LLDP links responsible for the fabric devices to be in cfg-refresh-err state.

3. Execute the TPVM failover by 'tpvm stop' and 'tpvm start' commands during the LLDP Link up (LA) event handling caused by 2.

Recovery:

1The user triggers LD/LA event by flapping the interface links which are the devices are in the cfg-refreshed state even though DRC wouldn't help out to recover the device to the cfg-sync state and the pending reason is "LA/LD".

1.1. "shutdown" the interface link on the physical link on Devices follow by "efa inventory device udpate --ip <device-ip>", which generates LD events

21.. "no shutdown" the interface link on the physical link on Devices follow by "efa inventory device udpate --ip <device-ip>", which generates LA events

1.3. If the pending config contains "LA" : Execute "efa inventory drift-reconcile execute --ip <device-ip> --reconcile" on the devices which are in cfg-refresh-err /cfg-refreshed state [or] IF the pending config contains "LD,LA" : Execute "efa fabric configure --name <fabric-name>" to clean up the configuration on devices which are in cfg-refresh-err /cfg-refreshed state.

[OR]

2. The user reboots the devices without maintenance mode which are the devices are in cfg-refreshed state even though DRC wouldn't help out to recover the device to the cfg-sync state.

2.1. "reload" the switches without out maintenance mode to enable

2.2. Execute "efa inventory drift-reconcile execute --ip <device-ip> --reconcile" on the devices which are in cfg-refresh-err /cfg-refreshed state.

Parent Defect ID: XCO-7183 Issue ID: XCO-7183
Product: XCO Reported in Release: XCO 3.2.2
Symptom:

After changing DNS nameservers in /etc/netplan and running the

update-dns.sh --dns-action allow, the following error is seen:

(efa:ubuntu)ubuntu@efa:/opt/efa$ sudo /opt/efa/update-dns.sh

/opt/efa/update-dns.sh Usage:

--help - Show this message

--dns-action <'allow'|'disallow'> - Allow host DNS entries to be forwarded

to the pods

(efa:ubuntu)ubuntu@efa:/opt/efa$ sudo /opt/efa/update-dns.sh --dnsaction

allow

Unexpected nameserver entry of 127.0.0.53 found in /etc/resolve.conf

(efa:ubuntu)ubuntu@efa:/opt/efa$

Condition:

In 18.04.6 and 20.04, Ubuntu uses a stub-resolv.conf located in /run/

systemd/resolve/stub-resolv.conf . This file is symlink to /etc/resolv.conf

in /run/systemd/resolve/.

There is another file, resolv.conf which contains the information for DNS

from netplan.

Additionally, systemd-resolved provides a local DNS stub listener on IP

address 127.0.0.53 on the local loopback interface. Programs issuing

DNS requests directly,bypassing any local API may be directed to this

stub, in order to connect them to systemd-resolved.

Note: The best practice is for local programs to use the glibc NSS or

bus APIs instead (as described above), as various network resolution

concepts (such as link-local addressing, or LLMNR Unicode domains)

cannot be mapped to the unicast DNS protocol.

We do not recognize the 127.0.0.53 address as valid.

Workaround:

If updating DNS to allow host entries to be forwarded to the pods using the update-dns.sh script in XCO-3.3.0 on Ubuntu 20.0.4 or 18.0.4-6 or above, follow these steps.

20.0.4 or 18.0.4-6 or above, follow these steps:

After netplan is applied and before running update_dns.sh

1. Check if symlink exists, if not directly edit /etc/resolv.conf to netplan

ip

$ ls -l /etc/resolv.conf

lrwxrwxrwx 1 root root 39 Feb 20 2021 /etc/resolv.conf -> ../run/

systemd/resolve/stub-resolv.conf <<<symlink exists

sbr@sbr-virtual-machine

~ $

2. Check if it has 127.0.0.53 ip in below files:

~ $ cat /etc/resolv.conf | grep nameserver

nameserver 127.0.0.53

sbr@sbr-virtual-machine

~ $ cat /run/systemd/resolve/stub-resolv.conf | grep nameserver

nameserver 127.0.0.53

sbr@sbr-virtual-machine

~ $

3. Edit the following file to add netplan ip for the nameserver and

remove 127.0.0.53

sudo vi /run/systemd/resolve/stub-resolv.conf

4. Check if both files are updated.

~ $ cat /run/systemd/resolve/stub-resolv.conf | grep nameserver

nameserver 10.10.10.0

sbr@sbr-virtual-machine

~ $ cat /etc/resolv.conf | grep nameserver

nameserver 10.10.10.0

sbr@sbr-virtual-machine

~ $

5. Run update_dns.sh --dns-action allow.

6. Run sudo netplan apply to restore /etc/resolv.conf and /run/systemd/

resolve/stub-resolv.conf to its default value of 127.0.0.53.

Parent Defect ID: XCO-7955 Issue ID: XCO-7955
Product: XCO Reported in Release: XCO 3.3.0
Symptom: When triggering the "Firmware Activate" process, it can lead to either parallel or serial execution, irrespective of the behavior of grouping devices for traffic loss. In cases where auto-commit is enabled, the activation can result in a "Firmware Commit Failed" status on the EFA end, even though the firmware commit has been successfully completed on the device end.
Condition:

The "Firmware Activate" process is initiated from the user interface, either through the Inventory Page or the Fabric-wide Page, even in the midst of an incomplete operation on a subset of devices.

For instance:

Device 1 and Device 2 trigger a download with auto-commit enabled from either the Inventory or Fabric-wide Page.

Device 3 triggers a download from the Fabric or Inventory Page.

Subsequently, Device 1 and Device 2 attempt to continue with the "Activate Download" operation from the inventory or fabric page, resulting in a "Firmware Commit Failed" failure.

Workaround: Do not initiate firmware upgrades on other devices until the device completes both the Activate operation and the commit operation.
Recovery: Based on the error in the flow sequences, use the following set of commands: "efa inventory debug unblock-from-fwdl" , "efa inventory device firmware-download" to continue with download operation
Parent Defect ID: XCO-8070 Issue ID: XCO-8070
Product: XCO Reported in Release: XCO 3.2.1
Symptom: 'efa system backup' fails with Error : Failed to execute service lock API due to error Role ServiceAdmin does not have permissions to path: /v1/inventory/lockservice, method: POST.
Condition:

When db error occurs in Rbac during initialization, then return gracefully.

Allowing Rbac to do re-initialization.

Earlier we were not returning when Db is unavailable/down, hence it was failing to load policies.

Thus, not able to resolve permissions to run command.

Parent Defect ID: XCO-8128 Issue ID: XCO-8128
Product: XCO Reported in Release: XCO 3.2.1
Symptom: Unable to Create EPG when out of band static-routes with next-hop interface exist
Condition: Root cause:

There are multiple issues observed when oob static routes are added. (a) When null route is added, inventory table is populated with wrong interface name - tengigabitethernet 0, (b) published vrfupdate event does not contain interface name and nexthoptype value is set to invalid value - 0, (c) when tenant reads the event it writes incomplete information in its db and (d) Tenant does not know how to read back the incomplete back properly.

Workaround: Not appilcable.
Recovery:
  1. Remove the static-routes from the vrf config in the device.
  2. Update the tenant db vrf_static_route to delete the corresponding stale records (oob_created=1 and nh_type = 0).
  3. Create the previously failing new epg-s and confirm that they succeed.
  4. Re-add the static-routes in the vrf config in the device.
Parent Defect ID: XCO-8200 Issue ID: XCO-8200
Product: XCO Reported in Release: XCO 3.3.0
Symptom: SLX Devices are not allowed to execute the same firmware download execution flow, which could result in traffic loss. For example, it is not allowed to choose two Leaf devices from the same MCT pair.
Condition: From the User Interface, go to the Fabric page & select a few devices.

Go to table action and select Firmware Upgrade option.

Workaround: The user selects the left-side leaf of the MCT pair and triggers firmware download and activation. Similarly, the user selects the right-side leaf of the MCT pair and triggers firmware download and activation.
Recovery: Choose another set of devices that will not result in traffic loss and proceed with the firmware download operation.
Parent Defect ID: XCO-8232 Issue ID: XCO-8232
Product: XCO Reported in Release: XCO 3.3.0
Symptom:

Error is observed while updating EFA system CLI setting

Error : error creating directory on remote: Could not chdir to home directory /users/home21/<username>: No such file or directory

Condition: While using CLI "efa system settings update --remote-server-ip <ip> --remote-transfer-protocol scp --remote-server-username <username> --remote-server-password <password> --remote-server-directory <remote-server-directory>"
Workaround: Use Remote Server which has bash support installed.
Recovery: Add bash support and retry the CLI command.
Parent Defect ID: XCO-8234 Issue ID: XCO-8234
Product: XCO Reported in Release: XCO 3.3.0
Symptom: The fabric alarm and the alarm status update notifications can briefly reflect a small time window where the fabric alarm is cleared when it is actually unhealthy.
Condition: This can occur during fabric formation or during any operation where fabric health is degraded due to multiple reasons (example:- spine to leaf link going down, BGP neighborship going down between spine and leaf, etc...). Once a specific device and links are repaired and deemed healthy, the overall fabric alarm may temporarily be cleared although other devices remain unhealthy. Then subsequently the fabric alarm will be corrected and put into an unhealthy state due to the remaining unhealthy devices.
Workaround: N/A
Recovery: The fabric alarm automatically recovers to the proper state. Its just that the fabric alarm may temporarily be cleared when it is actually not cleared yet.
Parent Defect ID: XCO-8267 Issue ID: XCO-8267
Product: XCO Reported in Release: XCO 3.2.1
Symptom: Devices successfully finished their backups and took 4 minutes but the completion timeout was set to 3 minutes. Hence we observed "Config Backup timed out for Device" error message.
Condition:

The monitor process is making rest calls to Inventory every 2 seconds to see if the backup is done yet and after 3 minutes monitor claims failure.

Setting the completion timeout greater than the netconf timeout which is 4 minutes and 50 seconds, so that the monitor won't have false positive failure messages.

Parent Defect ID: XCO-8289 Issue ID: XCO-8289
Product: XCO Reported in Release: XCO 3.2.1
Symptom: After setting breakout port on 9740, the next ports are still shown in 'show-running-config'
Condition: On 9740 breakout of port, can see unacceptable lines in efa show-running-config (with 21 broken out, can't have 22 admin state)
Parent Defect ID: XCO-8366 Issue ID: XCO-8366
Product: XCO Reported in Release: XCO 3.3.1
Symptom: IPv6-Prefix over IPv4-Peer device setting under Inventory service becomes refreshed and gets removed from the device when device is removed from fabric or entire fabric gets deleted. This setting doesn't get applied automatically to the device when it is added back to the fabric or fabric is reconfigured.
Condition:
  1. Configure fabric.
  2. Enable IPv6-Prefix over IPv4-Peer device setting from inventory CLI.
  3. Remove device from fabric or delete entire fabric.
  4. Add device back in fabric or re-configure fabric.

    Performing Step 4 doesn't configure IPv6-Prefix over IPv4-Peer setting on device and Inventory service keep on identifying drift for the same.

Recovery: Run DRC from Inventory service before/after adding device to fabric & reconfiguring fabric
Parent Defect ID: XCO-8574 Issue ID: XCO-8574
Product: XCO Reported in Release: XCO 3.3.0
Symptom: Delete/Remove route-map was successful even when bindings associated with BGP neighbor. It supposed to deny.
Condition:
  1. Create route-map stanza
  2. Configure it on the device
  3. Create BGP peer and peer-group with route-map binding
  4. Delete the route-map with --seq all
Workaround: Remove BGP peer/peer-group association first and then delete/remove the route-map from Device.
Recovery: Re-add the route-map to device again and then follow the workaround above for proper removal.
Parent Defect ID: XCO-8698 Issue ID: XCO-8698
Product: XCO Reported in Release: EFA 2.7.2
Symptom: Some of the anycast IP configs of existing ports-ctags of an epg have been found to be marked as 'deleted' in XCO's database. Could've been due to some sync issue that existed in 2.7 or previous images. When a port-group-add is performed on this epg with ports from new devices, the command is failed with 503 - Service available error as the tenant service gets a panic. Re-execution of same command succeeds without provisioning anycast configs in the device.
Condition:

Data corruption:

There is no mapping of anycast-ip with VE port in endpoint_group_network_properties_ip table for select VE ports (3 out of 20 to be exact). Instead, a NULL value was seen in device_id and device_port_ip_id column. This is normally done only when port-group-delete is executed, not otherwise.

Root cause:

there are other associated epg pord's anycast configs marked as deleted in XCO's database. This causes XCO to attempt to prepare configuration for these ports as well, as part of port-gruop-add use case. This use case has a bug in Tenant software and causes a non-fatal panic. this has cascaded to the actual issue customer faced.

Workaround: Not applicable
Recovery: Not applicable
Parent Defect ID: XCO-8700 Issue ID: XCO-8700
Product: XCO Reported in Release: XCO 3.3.0
Symptom: GUI upgrade went successful, but device stuck on GUI so can't perform configuration or delete the device.
Condition: GUI upgrade went successful, but device stuck on GUI so can't perform configuration or delete the device.
Parent Defect ID: XCO-8827 Issue ID: XCO-8827
Product: XCO Reported in Release: XCO 3.2.1
Symptom: "efa system backup --remote" failed when password length exceeds 16 chars on the remote server.
Condition:

System backup was failing due to error in decrypting the password.

Hence unable to do scp to remote host.

Parent Defect ID: XCO-8831 Issue ID: XCO-8831
Product: XCO Reported in Release: XCO 3.3.0
Symptom: XCO Visibility showing non-existent ports on 9920.
Condition: When looking at ports on 9920 non-existent ports shows up sometimes.
Parent Defect ID: XCO-8935 Issue ID: XCO-8935
Product: XCO Reported in Release: XCO 3.3.0
Symptom: Delete/Remove route-map from Device was successful even when bindings associated with BGP neighbor. It supposed to deny.
Condition:
  1. Create route-map stanza.
  2. Configure it on the device.
  3. Create BGP peer and peer-group with route-map binding.
  4. Update route-map with operation remove-device.
Workaround: Remove BGP peer/peer-group association first and then delete/remove the route-map from Device
Recovery: Re-add the route-map to device again and then follow the workaround above for proper removal.
Parent Defect ID: XCO-8936 Issue ID: XCO-8936
Product: XCO Reported in Release: XCO 3.3.0
Symptom: Delete or remove prefix-list from Device was successful even when bindings associated with BGP neighbor. It supposed to deny.
Condition:
  1. Create prefix-list.
  2. Create route-map.
  3. Create route-map-match and associate prefix created in Step 1 with route-map created in Step 2.
  4. Associate/advertise prefix-list/route-map in bgp peer-group.
  5. Delete or Remove Prefix-list from Device created in Step 1.
Workaround: Remove BGP peer/peer-group association first and then delete/remove the prefix-list from Device.
Recovery: Re-add the prefix-list to device again and then follow the workaround above for proper removal.
Parent Defect ID: XCO-8975 Issue ID: XCO-8975
Product: XCO Reported in Release: XCO 3.3.1
Symptom: ICL Expansion - Fabric Event 'Link Add (LA)' Not Shown for All Switches.
Condition: LA should come on both device d1, d2 (or) if this is not feasible, then the PR changes solves the issue by having additional checks on fabric LA handling specific to mct devices.
Parent Defect ID: XCO-9323 Issue ID: XCO-9323
Product: XCO Reported in Release: XCO 3.3.0
Symptom: GUI for Device Ports status when Group by "Name" doesn't match
Condition: GUI for Device Ports status when Group by "Name" doesn't match