tyomitch September 9, 2010 at 16:29

Why aren't BAARD removed in the release?

In beta versions of Windows 3.1 there was a hidden and encrypted code that, when launched on DR-DOS, produced an incomprehensible error message.

In the release, they decided not to deal with such tricks, but the verification code and the message itself were not removed: they remained inside WIN.COM, and it is enough to change one byte so that the AARD code is executed again at each start.

Why did they leave him? Was Microsoft hoping to unblock these dubious checks one day in the future?
Of course not. Even the message in the release remained unchanged: “Please contact Windows 3.1 beta support.” If the message was really intended to be shown, it would be updated after beta testing.

So why leave meaningless code in the release that never runs?

Larry Osterman explains:

Bypassing code execution by inserting a command JMPis pretty safe; and if you delete it, in the remaining code the function offsets will change - i.e. it will be a new , un-tested code. Beta testing was already completed, so the developers tried not to replace the tested code with non-tested.

But why on earth could such a slight change in the code affect something? Chris Pratley gives an example:
Even linking the code, not like recompiling, can introduce unexpected bugs. A few years ago, when we were working on the Asian release of Word97 and already considered the whole code ready, we started its final optimization. We have a tool that collects statistics on the performance of individual functions, and reorders them in a file in an optimal way; the function code does not change. After optimization, we gave the code for final testing, and - both! - there was a bug. When testers used a certain function of the program, on some machines the optimized code crashed. On the same machines, the same function worked fine before optimization.

We debugged; but if we added debugging information to the optimized version, it no longer crashed. We tried to debug it without additional information; but even if we just ran it under the debugger, it no longer crashed. In whatever way we try to find out the cause of the falls, the program did not fall; but she fell absolutely always when we left her alone.

We were about to use ICE (hardware debugger) when we noticed a pattern: the program crashed only on Pentium processors with a frequency of 150 MHz and lower, although not at all. It was already a lead. We went to the Intel website and looked at the “list of inaccuracies” (as they call their bugs). Bingo! Pentium processors had an “inaccuracy”, which in certain conditions leads to a malfunction. In a verycertain conditions: if after 33 bytes after JMPthere is a conditional jump, and it JMPis located on the border of the memory page. This "inaccuracy" has been fixed since the release of the Pentium 150MHz.

In fact, there are quite often bugs in chips, although only a few of them become well-known. In the end, people write microcode chips, not gods. Usually, as soon as chip makers find a bug in it, they tell the compiler makers; so that compilers generate code bypassing detected bugs, and ordinary programmers with these bugs no longer get stuck. It turned out that we had a slightly outdated compiler that did not know about this particular bug yet.

When an Intel representative confirmed that mysterious crashes of the program could be caused by this very bug, we went over the optimized code, found the guilty sequence of commands, and manually rearranged three bytes so that the distance between two jumps was 34 bytes. The falls have stopped.

Now, when someone assures me that his correction is “absolutely safe”, I always tell this story. No code change can be absolutely safe.

I myself came across a similar bug: when we worked on Exchange 5.5, the next build crashed on one test machine - always on the same one. For several days we tried to find out the reason, but to no avail: the bug disappeared from the slightest code change. But it absolutely always showed up when we stopped debugging. In the end, we, like Chris, found a “list of inaccuracies”; and indeed, our code suffered from one of them. Not content with fixing a specific build, we found a set of compilation options where the bug was impossible.

Therefore, it is not surprising that Windows developers left the AARD code in the release: they had been testing Windows with this code for many weeks, if not months, and they knew for surethat when this code is in place, windows works. Whether Windows works without it, they - before the release - did not dare to find out.

BAARD was not the only interesting feature of the beta version of Windows 3.1.
For the first time, Windows intercepted pressing Ctrl-Alt-Del, and showed its own screen, "Let me help you close a frozen program." Those who use Windows 3.x remember that this screen was blue; in the beta, he was dull black.

If the user confirmed that the hung application was closed, but Windows could not complete it (for example, if there were simply no hung applications in the system), then the only thing Windows could offer was a reboot.

In the Windows 3.1 release, this oddity was fixed: now, if there are no hanging applications, Windows suggests leaving everything as it is.

I read another similar story, it seems, from Raymond Chen; I could not find the source right now. There Chen said that in one of the ancient builds of Windows they discovered an unused variable. We deleted it - in a completely different place in the code the function stopped working. They returned the variable - the bug was gone.

This time, the chips were in order: the programmer really had a problem. In the broken function, there was a variable that was not initialized in all cases before use. The value of the uninitialized variable was garbage that accidentally happened to be in the stack cell allocated for it; and it so happened that in this place always turned out to be zero. Zero was the appropriate value for that variable, and the program continued to run.

When the unused variable was deleted, all other variables in the program “moved” along the stack, and now in the place reserved for the uninitialized variable, some other value appeared. Now the program crashed.

To Chen’s colleagues ’surprise, the feature was written many months before, and even ended up in a previous Windows release! Due to a successful combination of circumstances, it always worked correctly, despite the bug, which is why the bug has gone unnoticed for so long.

Chen cited this story as an explanation of why there are snippets of code in Windows XP that no one has touched since Windows 3.0. No one knows how many invisible bugs there are; but everyone knows for sure: this code works .

Tags:

Why aren't BAARD removed in the release?

Also popular now: