Firmware Recovery for LSI RAID Controllers

Published on January 16, 2014

Firmware Recovery for LSI RAID Controllers

Good day, habravchane!

I want to tell you about how I restored the firmware of the LSI MegaRAID RAID controller after an unsuccessful upgrade.
When this misfortune happened to me, I practically did not find information about this, although I admit that I didn’t google it well.

Anamnesis


In my work, I have been using Supermicro servers for a long time, since they have a large selection of platforms, a fairly affordable price and decent reliability.

Often, especially in the case of 1U servers, I take them already with the integrated LSI MegaRAID controller.

But the problem with them is that Supermicro itself is not very willing to upload firmware for built-in controllers, so I usually flash them with the latest firmware (oil, yes) from a similar LSI controller. There were no problems until then.

Recently brought several servers with LSI 2208 controllers on board and fairly old firmware.
Because I also actively use discrete controllers on these chips, then I didn’t particularly doubt that I booted from a USB flash drive with Linux and started the usual one:
./MegaCli64 -AdpFwFlash -f mr2208.rom -a0
and went on to pursue his own affairs.

The next time I looked at the server terminal, I saw the same picture as it was - “Flashing firmware ...” and no result. Trouble, Stirlitz thought.

Logging on to the server via SSH did not succeed, looking at the VGA console I saw messages that the root FS switched to Read Only mode and in general everything is very bad, and at any moment it will be even worse.

I make a Reset and see this picture:

image

Yes, trouble. Searches on the Internet did not lead to any result. Apparently, the problem is quite rare.

Treatment


I tried to boot from the flash drive and flash the controller again, but neither under DOS, nor under Linux did MegaCli detect it at all. Flash, respectively, also refused.

So I turned to the LSI support, where a kind person with a Hindu name pointed me to the documentation for MegaRAID, namely page 305, where there is such a rather inconspicuous subsection that does not really explain why it is written in it:

image

Yeah, partisans thought, probably this is firmware in recovery mode, and got down to business.

Under Windows, a flash drive with FreeDOS is easiest to do using Rufus utility , literally in one click.
Under Linux, you can do the same with available tools (using syslinux or GRUB), there are many articles on this topic.

We fill in MegaCli.exe and the firmware found on the expanses of ftp.supermicro.com . We boot, run:


MegaCli.exe -AdpM0Flash -f smc2208.rom

I draw your attention to the fact that you do not need to specify an adapter (option -a), apparently it flashes all that it finds, or the first one that gets on the PCI bus.

Things went well: The

image

firmware in this mode takes quite a long time, about 15 minutes, so be patient.

When he finishes - turn off the server for power, turn it back on and wait for a miracle.
But instead of a miracle, we see such a bleak picture:

image

Googling for such an error leads to the only link to our compatriot’s blog , where he advises in pure English to disconnect BBU from the controller, remove the controller from the server and then put it back.

In my case, you can only remove the card from the server with a jigsaw, I don’t have a BBU, so this is not an option.
I try to flash in the standard way, MegaCli detects the controller, but says the same thing, they say F / W is in fault state , so I won’t do anything.

We turn again to the support, which shrugs and advises you to try the LSI Pre-Boot USB and CD tool , and if it does not help, then hand over the hardware back.

Ok, download the ISO, connect it via IPMI to the server and boot.
We select recovmr from the boot menu , then we are offered to write recover on the command line and happiness will come. But it didn’t come.
The BAT file cannot find the connected drive D: apparently the CDROM driver in FreeDOS on this LSI image is not friendly with the IPMI virtual drive.

Well, look at the BAT file and see what he was going to do there:
MegaCli.exe -AdpFwFlash -f D:\FW\RECOVER\TB_16MB.ROM -aALL

We open the ISO, look for this mysterious file and see that it is already 16 megabytes in size (yes, we already guessed from the name), which is twice as much as the standard firmware. Apparently, this ROM image completely rewrites the Flash chip on the controller.

We try to flash it in the same way as the BAT nickname was going to do, but we get the familiar: F / W is in fault state
Yes, the recovery image was prepared for us by the LSI.
Ok, we use our previous experience and try to flash this file through Mode0.

This time, the firmware took about 30 minutes, since the file is twice as large as usual. After flashing the power, turn off the server, turn it back on and see the cherished screen:

image

Salute, champagne, the server has been saved!

But this life-giving image does not contain the latest firmware version, so I booted up again with a light heart from the FreeDOS flash drive and went to flash it with the latest Supermicro firmware ... and again got stuck at the same stage as at the very beginning: The
image

circle closed. For the sake of fidelity, I left him in this form for the night, but nothing has changed.
After the reboot, we again have broken firmware.

By trial and error, it was found that after flashing the recovery image, you need to reset to the factory settings:
MegaCli.exe -AdpFacDefSet -a0

and turn off / on the server.

After that, it is already flashed without freezing, and we see a fresh version of the firmware:
image

That's it, this time it turned out a 100% victory over the rebellious iron!

Statement


The moral of this fable is this: if you do not want to spend a couple of days to restore or even more to return the equipment, then it’s better to flash firmware designed by the manufacturer of the iron (if he uploads them, from the same Supermicro I found it only by digging into the wilds of FTP - on there are no links on the server or motherboard page), or do not touch anything and live with the one that already exists.
Although I’m not sure that the problem was caused by the “foreign” firmware, and not by some random glitch, I don’t want to check it again.

There are cases when the firmware simply for some reason spoils (the power was turned off during the firmware or some other gamma-ray burst occurred in near space), and then you will have to resort to emergency recovery.

Hope this article helps those who stumble upon a similar issue in the future.