ACI | APIC unreachable after PCIe NIC card replacement
Following a hardware issue on a Cisco APIC, we had to replace the PCIe NIC card of the server (based on Cisco UCS). And as you may also encounter if you are reading this, it wasn’t straight forward :)
The initial problem was that the Eth2-1 and Eth2-2 ports went down after a few hours after each reboot, and that’s a problem in an active APIC cluster… we decided to replace this APIC by the standby one in order to maintain a stable cluster of 3x APICs, before replacing the card.
How to replace an active APIC with a standby one is an article to be created soon :)
We replaced the card ~20 minutes operation (shutdown the server, open the top panel, extract the PCIe card, replace with the new one, close the top panel, rack back the server, power it on, cable it back). The details can be found here, and for any other component replacement, it’s here.
After replacement, the symptoms are the following:
- The Card was now seen correctly from the CIMC (Networking > Adapter Card 1)
- Ports “Up” on the APIC side,
admin@APIC# bash admin@APIC:~> ip a admin@APIC:~> ethtool eth2-1 admin@APIC:~> ethtool eth2-2
- Wrong LLDP information (no real hostname and port detected, only the MAC addresses were displayed) and Ports “Out of service” on the leaf side…
LEAF101# show lldp neighbors Device ID Local Intf Hold-time Capability Port ID c4f7.d592.b85e Eth1/47 120 c4f7.d592.b862
LEAF101# show int eth1/47 status
Port Name Status Vlan Duplex Speed Type Eth1/47 -- out-of-ser trunk full 10G 10Gbase-SR
The problem was the following: after the replacement of the card, the LLDP configuration was enabled on the UCS side, and it was overwriting the LLDP packets in direction to the Leaf switch.
The leaf switches based on received LLDP packets were not able to determine that those ports were connected to the APIC and put interfaces in Out of Service state.
We had to disable LLDP from the CIMC interface of the APIC (Networking > Adapter Card 1 > General > Uncheck “Enable LLDP”).
Click Save, and then a reboot of the APIC is mandatory for it to be applied. (from the CIMC > Host Power > Power Cycle).
After that, LLDP was functioning properly on the APIC, connectivity with the cluster was re-established :
LEAF101# show lldp neighbors Device ID Local Intf Hold-time Capability Port ID APIC Eth1/47 120 eth2-1
The standby APIC was able to reach all the active APICs (it wasn’t before):
admin@APIC:> ping 10.30.0.1 PING 10.30.0.1 (10.30.0.1) 56(84) bytes of data. 64 bytes from 10.30.0.1: icmp_seq=1 ttl=58 time=0.282 ms 64 bytes from 10.30.0.1: icmp_seq=2 ttl=58 time=0.294 ms 64 bytes from 10.30.0.1: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> ping 10.30.0.2 PING 10.30.0.2 (10.30.0.2) 56(84) bytes of data. 64 bytes from 10.30.0.2: icmp_seq=1 ttl=58 time=0.282 ms 64 bytes from 10.30.0.2: icmp_seq=2 ttl=58 time=0.294 ms 64 bytes from 10.30.0.2: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> ping 10.30.0.3 PING 10.30.0.3 (10.30.0.3) 56(84) bytes of data. 64 bytes from 10.30.0.3: icmp_seq=1 ttl=58 time=0.282 ms 64 bytes from 10.30.0.3: icmp_seq=2 ttl=58 time=0.294 ms 64 bytes from 10.30.0.3: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> acidiag avread
(to get the cluster info seen from current APIC, the standby one)
admin@APIC1:> acidiag avread
(to get the cluster info seen from APIC1, one of the actives one, to compare with the vision of the any APIC, and see if they can exchange informations)
The standby APIC was now visible from the GUI (it wasn’t before).
- Newly arrived VIC had LLDP setting enabled.
- As a result, VIC card intercepted LLDP packets from APIC to switch and sent it’s own LLDP packets in direction to leaf switches.
- The leaf switches based on received LLDP packets were not able to determine that those ports are connected to the APIC and put interfaces in Out of Service state.
- Once we disabled LLDP on VIC card in CIMC and reloaded APIC to apply the changes the issue got resolved.
RTFM they said… It is stated here at P.19 of the Cisco APIC M3/L3 Server Installation and Service Guide, LLDP should be disabled: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/server/M3-L3-server/APIC-M3-L3-Server.pdf
The document replacement part says its APIC-PCIE-C25Q-04 but from the image you have attached, it is UCSC-PCIE-C25Q-04.
is it exactly the same?
I think they are the same, the “APIC” is installed on an UCS M5.
In my case the card was ordered directly by the TAC after investigation.
Here is an extract from the replacement mail:
Product delivered: UCSC-PCIE-C25Q-04 Cisco UCS VIC 1455 Quad Port 10/25G SFP28 CNA PCIE
Product to be returned: APIC-PCIE-C25Q-04 Cisco UCS VIC 1455 Quad Port 10/25G SFP28 CNA PCIE
Thanks for that. Do you think its the same with the APIC-M1 models, that is APIC-PCIE-C10T-02 vs UCSC-PCIE-C10T-02?
I think so, but please confirm with TAC if you can !