Open Defects

The following defects are open in Extreme Fabric Automation 2.5.5.

Parent Defect ID: EFA-5592 Issue ID: EFA-5592
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.2.0
Symptom: Certificates need to be manually imported on replaced equipment in-order to perform RMA.
Condition: RMA/replaced equipment will not have ssh key and auth certificate, in-order to replay the configuration on new switch user needs to import the certificates manually.
Workaround:

import certificate manually

efa certificates device install --ips x,y --certType

Parent Defect ID: EFA-8535 Issue ID: EFA-8535
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.0
Symptom: On a single-node installation of TPVM, after ip-change, EFA is not operational.
Condition: After IP change of the host system, if 'efa-change-ip' script is run by a different user other than the installation user, in that case, EFA is not operational.
Workaround: Restart k3s service using the command 'sudo systemctl restart k3s'
Parent Defect ID: EFA-8904 Issue ID: EFA-8904
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.2
Symptom: Single node deployment fails with 'DNS resolution failed.'
Condition: After a multi-node deployment and then un-deployment is done on a server, if single-node deployment is tried on the same server, the installer exits with the error, 'DNS resolution failed.'
Workaround: After un-deployment of the multi-node installation, perform a reboot of the server/TPVM.
Parent Defect ID: EFA-9010 Issue ID: EFA-9010
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.2
Symptom:

Creation of 100 Openstack VM/stacks fails at the rate of 10 stacks/min

One stack has 1 VM , 2 networks and 3 Ports (2 dhcp and one nova port)

Condition:

100 openstack stacks created at the rate of 10 stacks/min are sent to the EFA.

The EFA processing requests at such high case resuts in overwhelming the CPU,

Since the EFA cannot handle requests at such high rates, backlog of requests are created. This eventually results in VM reschedules and failure to complete some stacks with errors.

Workaround: 100 openstack stacks can be created with lower rate of creation consistently eg 3 stacks/min
Recovery: Delete the failed or all openstack stacks and create them at lower rate of creation e.g 3 stacks/min
Parent Defect ID: EFA-9065 Issue ID: EFA-9065
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.3
Symptom: EFA Port Channel remains in cfg-refreshed state when the port-channel is created immediately followed by the EPG create using that port-channel
Condition:

Below are the steps to reproduce the issue:

1. Create port-channel po1 under the ownership of tenant1

2. Create endpoint group with po1 under the ownership of tenant1

3. After step 2 begins and before step 2 completes, the raslog event w.r.t. step 1 i.e. port-channel creation is received. This Ralsog event is processed after step 2 is completed

Recovery:

1. Introduce switchport or switchport-mode drift on the SLX for the port-channel which is in cfg-refreshed state

2. Perform manual DRC to bring back the cfg-refreshed port-channel back to cfg-in-sync

Parent Defect ID: EFA-9439 Issue ID: EFA-9439
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Dev-State and App-State of EPG Networks are not-provisioned and cfg-ready
Condition:

Below are the steps to reproduce the issue:

1) Create VRF with local-asn

2) Create EPG using the VRF created in step 1

3) Take one of the SLX devices to administratively down state

4) Perform VRF Update "local-asn-add" to different local-asn than the one configured during step 1

5) Perform VRF Update "local-asn-add" to the same local-asn that is configured during step 1

6) Admin up the SLX device which was made administratively down in step 3 and wait for DRC to complete

Workaround: No workaround as such.
Recovery:

Following are the steps to recover:

1) Log in to SLX device which was made admin down and then up

2) Introduce local-asn configuration drift under "router bgp address-family ipv4 unicast" for the VRF

3) Execute DRC for the device

Parent Defect ID: EFA-9456 Issue ID: EFA-9456
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.3
Symptom: Issue is seen when the devices which are being added to fabric have IP addresses already configured on interfaces and are conflicting with what EFA assigns.
Condition: Issue will be observed if devices being added to fabric have IP addresses assigned on interfaces and these IP addresses are already reserved by EFA for other devices.
Workaround:

Delete the IP addresses on interfaces of devices having conflicting configuration so that new IP addresses can be reserved for these devices. One way to clear the device configuration is using below commands:

1. Register the device with inventory

efa inventory device register --ip <ip1, ip2> --username admin --password password

2. Issue debug clear "efa fabric debug clear-config --device <ip1, ip2>"

Recovery:

Delete the IP addresses on interfaces of devices having conflicting configuration so that new IP addresses can be reserved for these devices. One way to clear the device configuration is using below commands:

1. Register the device with inventory

efa inventory device register --ip <ip1, ip2> --username admin --password password

2. Issue debug clear "efa fabric debug clear-config --device <ip1, ip2>"

3. Add the devices to fabric

Parent Defect ID: EFA-9570 Issue ID: EFA-9570
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Add Device Failed because ASN used in border leaf showing conflict
Condition: If there are more than one pair of Leaf/border leaf devices then devices which are getting added first will get the first available ASN in ascending order and in subsequent addition of devices if one of device is trying to allocate the same ASN because of brownfield scenario then EFA will throw an error of conflicting ASN
Workaround:

Add the devices to fabric in following sequence

1)First add brownfield devices which have preconfigured configs

2)Add remaining devices which don't have any configs stored

Recovery:

Removing the devices and adding the devices again to fabric in following sequence

1)First add brownfield devices which have preconfigured configs

2)Add remaining devices which don't have any configs stored

Parent Defect ID: EFA-9576 Issue ID: EFA-9576
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Deletion of the tenant by force followed by the recreation of the tenant and POs can result in the error "Po number <id> not available on the devices".
Condition:

Below are the steps to reproduce the issue:

1. Create tenant and PO.

2. Delete the tenant using the "force" option.

3. Recreate the tenant and recreate the PO in the short time window.

Workaround: Avoid performing tenant/PO create followed by tenant delete followed by the tenant and PO recreate in the short time window.
Recovery: Execute inventory device prior to the PO creation.
Parent Defect ID: EFA-9591 Issue ID: EFA-9591
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: When certain BGP sessions are not in ESTABLISHED state after clearing the BGP sessions as part of fabric configure, we see this issue.
Condition: This condition was seen when "efa fabric configure --name <fabric name>" was issued after modifying the MD5 password.
Workaround: Wait for BGP sessions to be ready. Check the status of BGP sessions using "efa fabric topology show underlay --name <fabric name>"
Recovery: Wait for BGP sessions to be ready. Check the status of BGP sessions using "efa fabric topology show underlay --name <fabric name>"
Parent Defect ID: EFA-9674 Issue ID: EFA-9674
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.4.2
Symptom: Creation and deletion of stacks can result in failure. Network create fails as the previous network with same VLAN is not deleted.
Condition: Network is deleted and created in quick succession. Since the EFA processing takes time to delete the network at EFA, another call made for network create with same vlan id is also processed. This network create call will end in failure.
Workaround: Add delay between delete and create of stacks to allow more time for efa processing.
Recovery: Cleanup and recreate the failed network/stack at openstack
Parent Defect ID: EFA-9758 Issue ID: EFA-9758
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: When user modifies the remote-asn of BGP peer out of band, drift and reconcile is not reconciling the intended remote-asn of BGP peer configuration.
Condition: Issue will seen if the user modifies the remote ASN of BGP peer through out of band means, DRC is not reconciling the remote ASN.
Workaround: Login to the device where the remote ASN is modified and revert it back to what EFA has configured.
Recovery: Revert the remote ASN of BGP peer on the device through SLX CLI to what EFA has configured previously.
Parent Defect ID: EFA-9799 Issue ID: EFA-9799
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: 'efa status' response shows standby node status as 'UP' when node is still booting up
Condition: If SLX device is reloaded where EFA standby node resides, then 'efa status' command will still show the status of standby as UP.
Workaround: Retry the same command after sometime.
Parent Defect ID: EFA-9907 Issue ID: EFA-9907
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: When concurrent EFA tenant EPG update port-add or port-delete operation is requested where the commands involve large number of vlans and/or ports, one of them could fail with the error "vni in use error".
Condition: The failure is reported when Tenant service gets stale information about a network that existed a while back but not now. This happens only when the port-add and port-delete are done in quick succession
Workaround: Avoid executing port-add and port-delete of same ports in quick succession and in concurrence.
Recovery: None
Parent Defect ID: EFA-9930 Issue ID: EFA-9930
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Periodic backup happens according to the system timezone.
Condition: If the nodes in HA are not configured in the same timezone, then periodic backup is scheduled according to the timezone of the active node. When a failover happens, the schedule is changed to the timezone of the new active node.
Workaround: Configure the same timezone on both the nodes in a multi-node installation
Parent Defect ID: EFA-10026 Issue ID: EFA-10026
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: 'efa inventory device interface unset-fec' command will set the fec mode to 'auto-negotiation' instead of removing fec configuration.
Condition: Once fec mode is set on the interface, the configuration cannot be removed. 'efa inventory device interface unset-fec' command will set the fec mode to 'auto-negotiation' instead of removing fec configuration. This is because 'no fec mode' command is no longer supported on SLX.
Workaround: Default value for fec-mode is 'auto-negotiation' and will show up as-is in the output of 'show running-config'. Users can set a different value using 'efa inventory device interface set-fec', if required.
Recovery: Default value for fec-mode is 'auto-negotiation' and will show up as-is in the output of 'show running-config'. Users can set a different value using 'efa inventory device interface set-fec', if required.
Parent Defect ID: EFA-10048 Issue ID: EFA-10048
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom:

EPG: epgev10 Save for devices failed

When concurrent EFA tenant EPG create or update operation is requested where the commands involve large number of vlans and/or ports, one of them could fail with the error "EPG: <epg-name> Save for devices Failed".

Condition: The failure is reported when concurrent DB write operation are done by EFA Tenant service as part of the command execution.
Workaround: This is a transient error and there is no workaround. The failing command can be executed once again and it will succeed.
Recovery: The failing command can be rerun separately and it will succeed.
Parent Defect ID: EFA-10062 Issue ID: EFA-10062
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Removing device from Inventory would not clean up breakout configuration on interfaces that are part of port channels.
Condition: This condition occurs when there is breakout configuration present on device that is being deleted from EFA Inventory, such that those breakout configurations are on interfaces that are part of port-channels
Workaround: Manually remove the breakout configuration, if required.
Recovery: Manually remove the breakout configuration, if required.
Parent Defect ID: EFA-10063 Issue ID: EFA-10063
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Deleting device from EFA Inventory would not bring up interface to admin state 'up' after unconfiguring breakout configuration
Condition: This condition occurs when there is breakout configuration present on device that is being deleted from EFA Inventory
Workaround: Manually bring the admin-state up on the interface, if required
Recovery: Manually bring the admin-state up on the interface, if required
Parent Defect ID: EFA-10093 Issue ID: EFA-10093
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: Deletion of the VLAN/BD based L3 EPGs in epg-delete-pending state will result in creation and then deletion of the VLAN/BD on the admin up device where the VLAN/BD was already removed
Condition:

Issue occurs with the below steps:

1. Create L3 EPG with VLAN/BD X on an MCT pair

2. Admin down one of the devices of the MCT pair

3. Delete the L3 EPG. This results in the L3 configuration removal (corresponding to the L3 EPG getting deleted) from the admin up device and no config changes happen on the admin down device and the EPG transits to epg-delete-pending state

4. Admin up the device which was made admin down in step 2

5. Delete the L3 EPG which transited to epg-delete-pending state in step 3

Recovery: Not needed
Parent Defect ID: EFA-10252 Issue ID: EFA-10252
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: When concurrent EFA tenant EPG update port-group-add operations are requested where the tenant is bridge-domain enabled, one of them may fail with the error "EPG network-property delete failed"
Condition: The failure is reported when concurrent resource allocations by EFA Tenant service as part of the command execution.
Workaround: This is a transient error and there is no workaround. The failing command can be executed once again and it will succeed.
Recovery: The failing command can be rerun separately and it will succeed
Parent Defect ID: EFA-10268 Issue ID: EFA-10268
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: When concurrent EPG deletes on bd-enabled tenant are requested where the EPGs involve large number of vlans, local-ip and anycast-ip addresses, one of them may fail with the error "EPG: <epg-name> Save for Vlan Records save Failed".
Condition: The failure is reported when concurrent DB write operation are done by EFA Tenant service as part of the command execution.
Workaround: This is a transient error and there is no workaround. The failing command can be executed once again and it will succeed.
Recovery: The failing command can be rerun separately and it will succeed.
Parent Defect ID: EFA-10288 Issue ID: EFA-10288
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom:

The bgp peer gets deleted from the SLX but not from EFA. This issue is seen when the following sequence is performed.

1. Create static bgp peer

2. Admin down one of the devices

3. Update the existing bgp static peer by adding a new peer

4. Update the existing bgp static peer by deleting the peers which were first created in step1. Delete from both devices

5. Admin up the device

6. efa tenant service bgp peer configure --name "bgp-name" --tenant "tenant-name"

Once the bgp peer is configured, the config is deleted from the switch for the device which is in admin up state whereas EFA still has this information and displays it during bgp peer show

Condition: When a bgp peer is created and update operations are performed when one of the devices are in admin down state, the configuration for the admin up device is deleted from the slx switch but remains in efa when "efa tenant service bgp peer configure --name <name> --tenant <tenant>" is performed.
Workaround: Delete the peer for admin up device first and then delete the peer from admin down device as a separate cli command.
Recovery: Perform a drift reconcile operation for the admin up device so that the configuration gets reconciled on the switch.
Parent Defect ID: EFA-10387 Issue ID: EFA-10387
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: -EFA OVA services not starting if no IP address is obtained on bootup.
Condition:

When EFA OVA is deployed, and does not obtain a DHCP IP address, not all EFA

services will start

Workaround:

Configure static IP, or obtain IP address from DHCP.

cd /opt/godcapp/efa

type: source deployment.sh

When the EFA installer appears, select Upgrade/Re-deploy

Select OK

Select single node, Select OK

Select the default of No for Additional management networks.

Select yes when prompted to redeploy EFA.

Once EFA has redeployed, all services should start

Parent Defect ID: EFA-10389 Issue ID: EFA-10389
Severity: S3 - Medium
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: When upgrade process is quit at any possible stage, older EFA stack doesn't get identified from the same node on which process has been initiated.
Condition:

If user selects "No" when EFA asks for final confirmation before upgrade process gets started, the process gets terminated, but older stack can't be identified any longer on SLX. Checking "show efa status" reflects "EFA application is not installed. Exiting..."

However there is no functional impact on EFA setup and EFA setup continues to work properly on TPVMs with existing version.

Workaround: Upgrade process can be initiated again from peer node
Parent Defect ID: EFA-10398 Issue ID: EFA-10398
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: EFA Tenant REST Request fails with an error "service is not available or internal server error has occurred, please try again later"
Condition: Execution of the EFA Tenant REST requests which take more time (more than 15 minutes) to get completed
Workaround:

Execute "show" commands to verify if the failed REST request was indeed completed successfully.

Re-execute the failed REST request as applicable.

Recovery: No recovery
Parent Defect ID: EFA-10445 Issue ID: EFA-10445
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.0
Symptom: When continuous EPG updates with repeated local-ip-add and local-ip-delete operations are done on same EPG repeatedly without much gap in-between, Tenant service may occasionally reject subsequent local-ip-add command incorrectly.
Condition: When continuous EPG updates with repeated local-ip-add and local-ip-delete operations are done on same EPG repeatedly without much gap in-between, Tenant service may occasionally retain stale information about the previously created IP configuration and reject subsequent local-ip-add command incorrectly.
Workaround: There is no work-around to avoid this. Once the issue is hit, user may use a new local-ip-address from another subnet.
Recovery:

Follow the steps below to remove the stale IP address from Tenant's knowledge base:

1. Find the management IP for the impacted devices. this is displayed in the EFA error message

2. Find the interface VE number. This is same as the CTAG number that the user was trying to associate the local-ip with

3. Telnet/SSH to the device management IP and login with admin privilege

4. Set the local IP address in the device

configure t

interface ve <number>

ip address <local-ip>

5. Do EFA device update by executing 'efa inventory device update --ip <IP> and wait for a minute for the information to be synchronized with Tenant service database

6. Reset the local IP address in the device

configure t

interface ve <number>

no ip address

7. Do EFA device update and wait for a minute for the information to be synchronized with Tenant service database

These steps will remove the stale entries and allow future local-ip-add operation to be successful.

Parent Defect ID: EFA-10455 Issue ID: EFA-10455
Severity: S3 - Moderate
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: "efa status" takes several minutes longer than expected to report a healthy EFA status.
Condition: This problem happens when kubernetes is slow to update the standby node's Ready status. This may be a bug in the shipped version of kubernetes.
Recovery: EFA will recover after a period of several minutes.
Parent Defect ID: EFA-10548 Issue ID: EFA-10548
Severity: S2 - High
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.2
Symptom: When EPG delete operations are done concurrently for EPGs that are on bridge-domain based tenant where the EPG was created with more number of bridge-domains, one of the command may fail with the error "EPG: <epg name> Update for pw-rofile Record save Failed".
Condition: The failure is reported when concurrent DB write operation are done by EFA Tenant service as part of the command execution causing the underlying database to report error for one of operation.
Workaround: This is a transient error that can rarely happen and there is no workaround. The failing command can be executed once again and it will succeed.
Recovery: The failing command can be rerun separately and it will succeed.
Parent Defect ID: EFA-10606 Issue ID: EFA-10606
Severity: S3 - Moderate
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.2
Symptom: "efa status" takes several minutes longer than expected to report a healthy EFA status.
Condition: This problem happens when kubernetes is slow to update the standby node's Ready status. This may be a bug in the shipped version of kubernetes.
Recovery: EFA will recover after a period of several minutes.
Parent Defect ID: EFA-10684 Issue ID: EFA-10684
Severity: S2 - Major
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.1
Symptom: EFA cannot start - Init:ErrImageNeverPull
Parent Defect ID: EFA-10754 Issue ID: EFA-10754
Severity: S3 - Moderate
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.2
Symptom: EFA - Backup create fails (timeout)
Recovery: efa inventory debug devices-unlock --ip 21.150.150.201" will resolve the issue and backup can be done after a efa login.
Parent Defect ID: EFA-10759 Issue ID: EFA-10759
Severity: S3 - Moderate
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.2
Symptom: Upgrade failed due to DRC timeout.
Workaround: Perform upgrade with 5 devices to avoid sequential DRC timeout.

Parent Defect ID: EFA-10982 Issue ID: EFA-10982
Severity: S3 - Moderate
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.3
Symptom: Efa inventory drift-reconcile history failed after reloading L01/L02
Parent Defect ID: EFA-11063 Issue ID: EFA-11063
Severity: S2 – Major
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.4
Symptom: The standby status of the EFA node was showing as down although it was actually ready to allow for failover
Condition: The issue happened because one of the pods - rabbitmq was in Crashloopbackoff instead of init mode. This is not a functional issue since its just a status issue.
Workaround: Reboot the standby - which doesn't cause any down time. Another workaround is to restart k3s using systemctl restart k3s command.
Recovery rebooting the node will fix the pods or restarting k3s will fix the issue
Parent Defect ID: EFA-11248 Issue ID: EFA-11248
Severity: S2 – Major
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.5
Symptom:

Observation 1 : Delay for long : few nodes moved to cfg-refresh/cfg-refresh-error:

30 min after, auto device update helps to move Border-leaf states as “cfg-in-sync

Again after 30 min, auto device update helps to move leaf states as “cfg-in-sync

Again after 30 min, auto device update helps to move spine states as “cfg-in-sync

Observation 2 : No change in spine config, shown as cfg-refresh

Spine node lldp peer node leaf/border leaf validates, if the MCT link failure, spine node doesn‘t get chance to move to 4th stage ( as part of firmware download case /lldp)

Observation 3 : B2 : Broder leaf non-selection group node went to cfg-refresh

If lldp update is missed on peer nodes Border Leaf1, and the fabric got lldp on B2 which leads to failure on fabric operation.

B2 node never gets an update event from inventory and there is no chance to compute fabric app/state update.

Workaround:

Step1 : efa fabric error show --name stage3

Step2: execute Drift-only on error node ( border MCT leaf)

Step3: execute Drift-only on Leaf node

Step4: execute Drift-only on Spine node

[or]

If the state is not moved from cfg-referesh, force to do DRC on the node.

Parent Defect ID: EFA-11335 Issue ID: EFA-11335
Severity: S2 – Major
Product: Extreme Fabric Automation Reported in Release: EFA 2.5.5
Symptom: On scaled setup CLI: "efa tenant service bgp peer operational show" failed with the error "service is not available".
Condition:

Below are the steps to reproduce the issue:

1. A tenant is configured with the member ports spanning across 8 devices of the fabric.

2. All 8 devices are configured with 100 VRFs and each VRF has 2 static and 1 dynamic peer.

3. Execute "efa tenant service bgp peer operational show"

Workaround: Execute "efa tenant service bgp peer operational show --tenant <tenant-name> --vrf <tenant-vrf-name>" instead of "efa tenant service bgp peer operational show --tenant <tenant-name>" or "efa tenant service bgp peer operational show"