Cisco VSS: a bug that has not been fixed
Today I will continue the story about the not obvious nuances of the Cisco Catalyst 6509 core-level switch in VSS mode. Since many people use this platform in their infrastructure, I believe that this story can be useful to someone.
The beginning of fascinating stories with VSS was laid a year ago and described in this post .
So, exactly one year later, at the January quarterly preventive maintenance of this year, as usual, the item “core vacuum cleaner” was included in the work plan. Let me remind you that the core of our network is a VSS pair of Cisco Catalyst 6509 switches. Here is a summary of statistics:
Each switch has one SUP Engine 720 10GE on board.
It was decided to start the process of dust removal by vacuuming from a standby chassis. Turned off, vacuumed. Included. Oil painting - Standby-chassis went into a cyclic reboot due to a configuration synchronization error:
If you are interested in how events developed further,
This time it was decided not to show heroism and initiative and just turn off the standby-chassis. So they did. Remained to fly on the main wing. Network performance during cyclic reboots of the standby chassis was not affected. In the morning, all the necessary information was sent to the technical support of the integrator, and he, in turn, opened a case in Cisco TAC and began to wait. A response from CTAC followed quickly. We were asked to reproduce the situation with cyclic reboot and remove the following debug when the standby chassis is on:
“Debug redundancy config-sync bulk”
“debug redundancy progression”
At night, the debug was removed and sent to CTAC. I did not publish here. There is a lot of text and little understood.
CTAC reported that this behavior is described in DDTS:
CSCtx12231
Config Sync: Bulk-sync failure due to PRC mismatch in ACL
tools.cisco.com/bugsearch/bug/CSCtx12231/?reffering_site=dumpcr
Because you need an account on cisco.com to view, then I ’ll post a screen here:
However, our release 12.2 (33) SXJ6 is listed there as “Known Fixed Releases” . What is the matter is incomprehensible. We were asked to remove duplicate lines (ACEs) from the ACLs that were presented in the output of “show redundancy config-sync failures prc”:
and try loading the standby chassis. We immediately had questions, the answers to which from CTAC I will give in the screenshot below:
1. Is it possible to show the correct ACE duplicate deletion right away from the “show redundancy config-sync failures prc” output, or will I have to run standby to to check it out?
2. Would this bug prevent switching to standby if the active chassis were rebooted?
3. We had situations when IOS did not allow to add duplicate ACE. I would like to clearly understand the scenarios when such a check is performed, and when not (presumably related to object-groups). You need to know where to be especially careful and what to double-check.
As a result, we removed the duplicated ACEs from the config of the active chassis with standby turned off, but after that the output of “show redundancy config-sync failures prc” did not change, which indicated that this check would occur when trying to load the standby chassis. The next technical window was planned, during which the standby chassis was launched. Bottom line - everything started up, messages about duplicate ACEs disappeared from the output of “show redundancy config-sync failures prc”.
Now everything works, we pay special attention to editing ACLs in order to prevent a repetition of the situation. To questions about how it turned out that our IOS release is listed as corrected from this bug and why IOS still allowed duplicate ACEs in due time, we are waiting for answers from Cisco TAC.
When new information from CTAC appears, I will update the post or unsubscribe in the comments.
Good luck to everyone in the battle field!
The beginning of fascinating stories with VSS was laid a year ago and described in this post .
So, exactly one year later, at the January quarterly preventive maintenance of this year, as usual, the item “core vacuum cleaner” was included in the work plan. Let me remind you that the core of our network is a VSS pair of Cisco Catalyst 6509 switches. Here is a summary of statistics:
Each switch has one SUP Engine 720 10GE on board.
It was decided to start the process of dust removal by vacuuming from a standby chassis. Turned off, vacuumed. Included. Oil painting - Standby-chassis went into a cyclic reboot due to a configuration synchronization error:
If you are interested in how events developed further,
This time it was decided not to show heroism and initiative and just turn off the standby-chassis. So they did. Remained to fly on the main wing. Network performance during cyclic reboots of the standby chassis was not affected. In the morning, all the necessary information was sent to the technical support of the integrator, and he, in turn, opened a case in Cisco TAC and began to wait. A response from CTAC followed quickly. We were asked to reproduce the situation with cyclic reboot and remove the following debug when the standby chassis is on:
“Debug redundancy config-sync bulk”
“debug redundancy progression”
At night, the debug was removed and sent to CTAC. I did not publish here. There is a lot of text and little understood.
CTAC reported that this behavior is described in DDTS:
CSCtx12231
Config Sync: Bulk-sync failure due to PRC mismatch in ACL
tools.cisco.com/bugsearch/bug/CSCtx12231/?reffering_site=dumpcr
Because you need an account on cisco.com to view, then I ’ll post a screen here:
However, our release 12.2 (33) SXJ6 is listed there as “Known Fixed Releases” . What is the matter is incomprehensible. We were asked to remove duplicate lines (ACEs) from the ACLs that were presented in the output of “show redundancy config-sync failures prc”:
and try loading the standby chassis. We immediately had questions, the answers to which from CTAC I will give in the screenshot below:
1. Is it possible to show the correct ACE duplicate deletion right away from the “show redundancy config-sync failures prc” output, or will I have to run standby to to check it out?
2. Would this bug prevent switching to standby if the active chassis were rebooted?
3. We had situations when IOS did not allow to add duplicate ACE. I would like to clearly understand the scenarios when such a check is performed, and when not (presumably related to object-groups). You need to know where to be especially careful and what to double-check.
As a result, we removed the duplicated ACEs from the config of the active chassis with standby turned off, but after that the output of “show redundancy config-sync failures prc” did not change, which indicated that this check would occur when trying to load the standby chassis. The next technical window was planned, during which the standby chassis was launched. Bottom line - everything started up, messages about duplicate ACEs disappeared from the output of “show redundancy config-sync failures prc”.
Now everything works, we pay special attention to editing ACLs in order to prevent a repetition of the situation. To questions about how it turned out that our IOS release is listed as corrected from this bug and why IOS still allowed duplicate ACEs in due time, we are waiting for answers from Cisco TAC.
When new information from CTAC appears, I will update the post or unsubscribe in the comments.
Good luck to everyone in the battle field!