"Memory Component Issue", or large-scale defective network equipment
The existence of a problem, which many suspected, was confirmed.
Cisco announced that an unnamed memory manufacturer for five years (from 2005 to 2010) delivered them a marriage. The nature of the marriage: equipment with this memory can accumulate uptime for years without causing any complaints about its work, but it is worth rebooting it (by power or even simple reload) - the memory stops working correctly, the device itself either does not boot, or it boots and periodically crashes. This is due to the degradation of memory chips. According to the vendor, the main problems begin after two years of operation.
Before the rotten tomatoes fly to Cisco, I hasten to warn: the memory is standard, many vendors bought it, therefore a great many pieces of equipment can be affected. there isconfirmation of similar problems at Juniper. But only Cisco confessed, despite the inevitable reputation damage. Their financial losses due to this disaster amount to about $ 655 million .
We sit down, get validol and look at the list of affected equipment.
Specific partnumbers and a detailed description of symptoms can be found in Field Notices or directly from the links.
I repeat, in the risk zone, equipment manufactured 5-10 years ago and still worked perfectly, and failure occurs precisely when rebooting in any way, and not during regular operation.
The replacement is standard, according to RMA, the entire piece of iron or the memory bar, as soon as it breaks. Apparently, defective memory is far from 100% of the equipment mentioned above, and even if it is in your piece of hardware, it can die not from today's reboot, but after 10 years.
Check by serial numbers, who is in danger, it is impossible. No way. I've tried.
Colleagues. I think that at this stage everyone understood that the approach I saw “I once bought a Cisco router for a lot of money, it worked for years and will last for many years, the reserve is not needed” is criminal. And even a hot reserve may no longer help. Imagine that a light blinked in the data center and that’s ityour network equipment is broken and needs to be replaced by the very fact of short-term blackout and reboot. Even a simple scheduled nightly reload of an unreserved piece of iron can result in a frantic search for a replacement and a long downtime. Assess risks, fill out service contracts with fast delivery, find or purchase a replacement memory in advance, change the hardware itself to a newer one. Based on the fact that after the next reboot, any piece of iron from the list above (and not only) may not rise, plan the escape route.
Finally, with a minute of silence, we honor one of the many untimely deceased memory dice that previously served faithfully as part of 2811 routers.
Cisco announced that an unnamed memory manufacturer for five years (from 2005 to 2010) delivered them a marriage. The nature of the marriage: equipment with this memory can accumulate uptime for years without causing any complaints about its work, but it is worth rebooting it (by power or even simple reload) - the memory stops working correctly, the device itself either does not boot, or it boots and periodically crashes. This is due to the degradation of memory chips. According to the vendor, the main problems begin after two years of operation.
Before the rotten tomatoes fly to Cisco, I hasten to warn: the memory is standard, many vendors bought it, therefore a great many pieces of equipment can be affected. there isconfirmation of similar problems at Juniper. But only Cisco confessed, despite the inevitable reputation damage. Their financial losses due to this disaster amount to about $ 655 million .
We sit down, get validol and look at the list of affected equipment.
Specific partnumbers and a detailed description of symptoms can be found in Field Notices or directly from the links.
- ACE10, ACE20, and ACE30 Modules
- FWSM
- ADM & AGM
- SAMI
- Miscellaneous HWIC, EHWIC, EVM, NME, SM modules
- Different SPA
- Some phones
- Some codecs VKS
- ONS 15310 and 15454
- ASR 1000
- 7200/7300 Routers
- CRS
- Much under 7600 and 6500
- 800 , 1800, 2800, and 3800 Series Routers
- ESR10k
- IPS-4240 and 4255 sensors
- ASA 5505 and 5510-5550 / ASA-SSM (5500-X models not mentioned)
- MDS 9000
- Catalyst Express 500
- Nexus 7000
- Catalyst 4500/4900
- Catalyst 3k
- Catalyst 2k
- ME3400 and ME2400
- MGX
- IE3000
- AS5400XM and AS5350
- UC520 / 540
I repeat, in the risk zone, equipment manufactured 5-10 years ago and still worked perfectly, and failure occurs precisely when rebooting in any way, and not during regular operation.
The replacement is standard, according to RMA, the entire piece of iron or the memory bar, as soon as it breaks. Apparently, defective memory is far from 100% of the equipment mentioned above, and even if it is in your piece of hardware, it can die not from today's reboot, but after 10 years.
Check by serial numbers, who is in danger, it is impossible. No way. I've tried.
Colleagues. I think that at this stage everyone understood that the approach I saw “I once bought a Cisco router for a lot of money, it worked for years and will last for many years, the reserve is not needed” is criminal. And even a hot reserve may no longer help. Imagine that a light blinked in the data center and that’s ityour network equipment is broken and needs to be replaced by the very fact of short-term blackout and reboot. Even a simple scheduled nightly reload of an unreserved piece of iron can result in a frantic search for a replacement and a long downtime. Assess risks, fill out service contracts with fast delivery, find or purchase a replacement memory in advance, change the hardware itself to a newer one. Based on the fact that after the next reboot, any piece of iron from the list above (and not only) may not rise, plan the escape route.
Finally, with a minute of silence, we honor one of the many untimely deceased memory dice that previously served faithfully as part of 2811 routers.
Hidden text![](https://habrastorage.org/getpro/habr/post_images/2dc/ee8/547/2dcee85474d8ca35e48df82a7726540c.jpg)
![](https://habrastorage.org/getpro/habr/post_images/2dc/ee8/547/2dcee85474d8ca35e48df82a7726540c.jpg)