ACI | APIC unreachable after PCIe NIC card replacement

Following a hardware issue on a Cisco APIC, we had to replace the PCIe NIC card of the server (based on Cisco UCS). And as you may also encounter if you are reading this, it wasn’t straight forward :)

The initial problem was that the Eth2-1 and Eth2-2 ports went down after a few hours after each reboot, and that’s a problem in an active APIC cluster… we decided to replace this APIC by the standby one in order to maintain a stable cluster of 3x APICs, before replacing the card.

How to replace an active APIC with a standby one is an article to be created soon :)

We replaced the card ~20 minutes operation (shutdown the server, open the top panel, extract the PCIe card, replace with the new one, close the top panel, rack back the server, power it on, cable it back). The details can be found here, and for any other component replacement, it’s here.

Symptoms

After replacement, the symptoms are the following:

  • The Card was now seen correctly from the CIMC (Networking > Adapter Card 1)
  • Ports “Up” on the APIC side,
Useful commands for troubleshooting:
admin@APIC# bash
admin@APIC:~> ip a
admin@APIC:~> ethtool eth2-1
admin@APIC:~> ethtool eth2-2
  • Wrong LLDP information (no real hostname and port detected, only the MAC addresses were displayed) and Ports “Out of service” on the leaf side…
LEAF101# show lldp neighbors
Device ID            Local Intf      Hold-time  Capability  Port ID
c4f7.d592.b85e        Eth1/47         120               c4f7.d592.b862
LEAF101# show int eth1/47 status
Port           Name                Status     Vlan       Duplex   Speed    Type
Eth1/47        --                  out-of-ser trunk      full     10G      10Gbase-SR

Resolution​

The problem was the following: after the replacement of the card, the LLDP configuration was enabled on the UCS side, and it was overwriting the LLDP packets in direction to the Leaf switch.

The leaf switches based on received LLDP packets were not able to determine that those ports were connected to the APIC and put interfaces in Out of Service state.

We had to disable LLDP from the CIMC interface of the APIC (Networking > Adapter Card 1 > General > Uncheck “Enable LLDP”).

Click Save, and then a reboot of the APIC is mandatory for it to be applied. (from the CIMC > Host Power > Power Cycle).

After that, LLDP was functioning properly on the APIC, connectivity with the cluster was re-established :

LEAF101# show lldp neighbors
Device ID Local Intf Hold-time Capability Port ID
APIC Eth1/47 120 eth2-1

The standby APIC was able to reach all the active APICs (it wasn’t before):

admin@APIC:> ping 10.30.0.1
PING 10.30.0.1 (10.30.0.1) 56(84) bytes of data.
64 bytes from 10.30.0.1: icmp_seq=1 ttl=58 time=0.282 ms
64 bytes from 10.30.0.1: icmp_seq=2 ttl=58 time=0.294 ms
64 bytes from 10.30.0.1: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> ping 10.30.0.2
PING 10.30.0.2 (10.30.0.2) 56(84) bytes of data.
64 bytes from 10.30.0.2: icmp_seq=1 ttl=58 time=0.282 ms
64 bytes from 10.30.0.2: icmp_seq=2 ttl=58 time=0.294 ms
64 bytes from 10.30.0.2: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> ping 10.30.0.3
PING 10.30.0.3 (10.30.0.3) 56(84) bytes of data.
64 bytes from 10.30.0.3: icmp_seq=1 ttl=58 time=0.282 ms
64 bytes from 10.30.0.3: icmp_seq=2 ttl=58 time=0.294 ms
64 bytes from 10.30.0.3: icmp_seq=3 ttl=58 time=0.237 ms
admin@APIC:> acidiag avread

(to get the cluster info seen from current APIC, the standby one)

admin@APIC1:> acidiag avread

(to get the cluster info seen from APIC1, one of the actives one, to compare with the vision of the any APIC, and see if they can exchange informations)

The standby APIC was now visible from the GUI (it wasn’t before).

Summary

  • Newly arrived VIC had LLDP setting enabled.
  • As a result, VIC card intercepted LLDP packets from APIC to switch and sent it’s own LLDP packets in direction to leaf switches.
  • The leaf switches based on received LLDP packets were not able to determine that those ports are connected to the APIC and put interfaces in Out of Service state.
  • Once we disabled LLDP on VIC card in CIMC and reloaded APIC to apply the changes the issue got resolved.​​

RTFM they said… It is stated here at P.19 of the Cisco APIC M3/L3 Server Installation and Service Guide, LLDP should be disabled:  https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/server/M3-L3-server/APIC-M3-L3-Server.pdf

Benoit

Network engineer at CNS Communications. CCIE #47705, focused on R&S, Data Center, SD-WAN & Automation.

More Posts - Website

Follow Me:
TwitterLinkedIn

5 Comments

  1. sam 5 novembre 2020

    The document replacement part says its APIC-PCIE-C25Q-04 but from the image you have attached, it is UCSC-PCIE-C25Q-04.

  2. sam 5 novembre 2020

    is it exactly the same?

  3. Benoit 5 novembre 2020

    Hi Sam,

    I think they are the same, the “APIC” is installed on an UCS M5.
    In my case the card was ordered directly by the TAC after investigation.

    Here is an extract from the replacement mail:
    Product delivered: UCSC-PCIE-C25Q-04 Cisco UCS VIC 1455 Quad Port 10/25G SFP28 CNA PCIE
    Product to be returned: APIC-PCIE-C25Q-04 Cisco UCS VIC 1455 Quad Port 10/25G SFP28 CNA PCIE

  4. Sam 5 novembre 2020

    Thanks for that. Do you think its the same with the APIC-M1 models, that is APIC-PCIE-C10T-02 vs UCSC-PCIE-C10T-02?

  5. Benoit 6 novembre 2020

    I think so, but please confirm with TAC if you can !

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *