Three favorite bugs

There are bugs and there are BAGs. And if bugs are usually fixed and forgotten, then BAGs stay with us forever. I want to share with you three of these BAZHishchi.

The first such incident occurred in 2005, when I worked at FriendScout24. We had a tool for monitoring, in which there was a plate and in each line on the server. If the server answered normally, it was drawn in green, if not, then in red. Usually everything was calmly green. And then, one fine August day, the servers began to fall with a ladder. Pam-Pam-Pam - 4 servers in 3 minutes. After 5 minutes, everything turned green again, as if nothing had happened.



This was repeated the next day, every other day, and so on all week. After the usual-suspects (loadbalancer, javascript) were excluded, Oliver (one of the front-end maidens) hypothesized that this was some kind of user. Since there were about 2 million users and about 25,000 logged in at the same time, it turned out to be difficult to find. But in the history of FriendScout24 there was already a situation when one user put the whole system, so we decided not to give up.

And well, in total, the reason for all the evil was photography. But not quite simple. One girl decided to enrich her profile with a photograph, which in itself is commendable and welcomed. However, her photograph was only in PDF format. Like all normal portals of that time, we did not accept PDFs, but accepted JPEGs and GIFs of various kinds. The girl - not a fool - renamed foto.pdf to photo.jpg. Thus, she bypassed the mime-type check and her photo swam into the wilds of the system. In these wilds sat imagemagick , then the state-of-the-art library for processing photographs. So here is imagemagick, also not a fool, instead of saying that it wasn’t jpg and sending the photo back, it recognized the pdf in the content and called its sidekick ghostscript to process this pdf. And since no one was ever going to process PDFs on these machines, no ghostscript was lying around, which caused an easy seg fault in the native lib, and safely put the JVM to rest nearby. Oops

The girl, without a gloom, tried all over again on the next server and killed the server one by one. Thank you very much for the fact that she did not have the patience to try 12 times, that is how many web servers we had then.

The second bug occurred in prehistoric times when I made one of the first versions of this site. The site has information about all kinds of dry cleaning machines and laundries, and all these machines were set in the content-management-system (cms) from which the site was drawn. At first everything was fine, a satisfied customer and all that. A week later, the customer called and complained that the addition of new machines lasted a suspiciously long time. I checked, the logs are empty, the server is idle, I did not find anything. The customer calls again, says added 100 cars, now every new car is added a minute. He looked, checked - he was telling the truth. In general, the work is done for a long time, but soon the fairy tale affects, put a measurement of time on almost every line, found a scoundrel. He did not believe his eyes for a long time: log.debug (cache) .

At the same time, debug itself was turned off, so I did not see anything in any logs, but the toString method of this cache simply painted the contents in all details. And lasted more and more. Three minutes for one operation. In general, since then I always use log.isDebugEnabled () . Though he wasted his time.

And finally, my favorite. The main bug of all time. It was commercials in 2003 on the same FriendScout-e. Before they hired me (maybe that's why they hired me). The platform at that time was very unstable, it often fell and was supported by people who had little understanding of what they were doing. And when people do not want or cannot understand the cause of the system’s bad behavior, they have one repair method - ctrl-alt-del. After all, what is good on Windows should be good everywhere, right?

In our case, one of the admins wrote a super-smart script that read system logs and if I found keyword FATAL there , then I restarted the entire application. With all 25 servers, moorings and steamboats. When the restarts became frequent they had to reconsider their policies. And it happened like this:
A woman calls the support service and says:

Woman: “Why when I log into your system does it immediately turn off?”
Support agent: “What is your username?”
Woman: “ femme-fatale ” (fatal woman )

A curtain.

Also popular now: