Emotional history of processors for the first computers from the 70s to the early 90s
As an illustrative material, I attach my small stone from Rosetta - a program for calculating the number π on different processors and systems using a gate algorithm, claiming to be the fastest of its implementations.
Intel 8080 and 8085
The first real processor on the chip, made in the first half of 1974, is still being manufactured and is currently being used. Many times it was cloned around the world, in the USSR it had the designation KR580VM80A. Modern Intel processors for the PC are still easy to detect their affinity to this already in a sense relic product. I myself did not write codes for this processor, but being well acquainted with the architecture of the z80, I would venture to quote some of my comments.
The 8080 command system, like other Intel processors for the PC, can hardly be called ideal, but it is universal, quite flexible and has some very attractive features. Motorola 6800 and MOS Technology 6502, 8080 favorably differed from their competitors by a large number of even somewhat clumsy registers, providing the user with one 8-bit battery A, one 16-bit semi-accumulator and part-time fast index register HL, 16-bit SP index stack. , as well as two more 16-bit registers VS and DE. The BC, DE, and HL registers could also be used as 6 byte registers. In addition, the 8080 had the support of an almost complete set of state flags: carry, sign, zero, and even parity and half transfer. Some of the instructions from the 8080 recruitment team have been speed champions for a long time. For example, XCHG command makes the exchange of the contents of the 16-bit registers DE and HL in just 4 clock cycles - it was extremely fast! A number of other teams, although they did not set such bright records, were also among the best for a long time:
- XTHL - the exchange of the contents of the register HL and data on top of the stack, 18 clocks - it seems like a lot, on even a real 16-bit 8086 such a command takes 22 clocks, and for 6800 or 6502 such a command is even difficult to imagine;
- DAD - add to the semi-accumulator HL the value of another 16-bit register (BC, DE or even SP), 10 cycles. This is a true 16-bit addition with a carry flag set. If you add HL to yourself, you will get a fast 16-bit left shift or multiplication by 2, a key operation for both realizing full multiplication and dividing;
- PUSH and POP - put in the stack and remove from the stack 16-bit value, respectively, from the register or in the register. Performed in 11 and 10 cycles. These are the fastest 8080 operations for working with memory, and when they are executed, SP is automatically incremented or decremented. PUSH can be used, for example, to quickly fill a memory with a pattern with values from 3 registers (BC, DE, HL). There are no commands for working with 8-bit values with a stack at all;
- LXI - loading a 16-bit constant into a register (HL, DE, BC, SP) in 10 cycles;
- RNZ, RZ, RNC, RC, RPO, RPE, RP, RM - conditional returns from the subroutine, allowed to make the code cleaner, eliminating the need to write unnecessary conditional transitions. These commands were abandoned in the x86 architecture, it is possible that in vain, the code with it turns out nicer.
This processor was used in the first Altair 8800 personal computer, which became very popular after a journal publication in early 1975. By the way, in the USSR, a similar publication happened only in 1980, and the corresponding
PC
for its relevance only in 1986. The first
Intel 8080 PC became the basis for the development of the formerly first-ever mass professional CP / M operating system, which dominated microcomputers for professional use until the mid-1980s.
Now about the shortcomings. The 8080 required three supply voltages of -5, 5, and 12 volts. Work with interruptions - clumsy and slow. And in general, the 8080 is more leisurely, if you compare it with competitors soon to appear. 6502 could be up to 3 times faster when working on the same frequency as the 8080.
But in the 8080 architecture it turned out, as it turned out, the correct vision of the future, namely the unknown in the 70s, the fact that processors will be faster than memory. Registers 8080 DE and BC are more likely a prototype of modern caches, with manual control than general registers. 8080 started at 2 MHz, while competitors started at only 1, which smoothed the performance difference.
It is difficult to call the 8080 an 8-bit processor at 100%. Of course, it has an ALU by 8 bits, but there are a lot of 16 bit commands that work faster than if you use only 8-bit counterparts instead. And for some teams, there are no 8-bit analogs at all. Team XCHG essentially and timing 100% 16-bit. There are real 16-bit registers. Therefore, I venture to call the 8080 partly 16-bit. It would be interesting to calculate the processor digit index by the totality of signs, but, as far as the author knows, nobody has done this work yet.
The author does not know the reason why Intel refused to directly support the development of 8-bit personal computers with its processors. Intel has always distinguished the complexity and ambiguity of the policy. Its connection with politics, in particular, is illustrated by the fact that for a long time Intel has plants in Israel and until the end of the 90s it was secret. Intel practically did not try to improve the 8080, it was only up to 3 MHz with the clock frequency raised. In fact, the 8-bit market was transferred to Zilog with a related 8080 z80 processor, which was able to quite successfully withstand the main competitor, the “terminator” 6502.
In the USSR and Russia, the domestic clone 8080 became the basis of many popular computers that remained popular until the early 1990s. This, of course, Radio-86RK, Microsha, multicolor Orion-128, Vector and Corvette. However, in the Clone Wars, the cheap and improved Z80-based ZX Spectrum clone won.
This is a real PC
In early 1976, Intel introduced the 8085 processor, compatible with the 8080, but significantly superior to its predecessor. In it, the power supply of -5 and 12 volts has become unnecessary and the connection scheme has been simplified, the work with interruptions has been improved, the clock frequency has been used from 3 to a very solid 6 MHz, the command system has been extended with several useful instructions: 16-bit subtraction, 16-bit offset to the right in just 7 cycles (this is very fast), 16-bit rotation to the left through the carry flag, loading a 16-bit register with an 8-bit offset (this command can also be used with the SP pointer of the stack), register HL at the address in the register DE similar to h HL through DE. All the instructions mentioned above, except for the shift to the right, are executed in 10 cycles — this is sometimes significantly faster than their counterparts or emulation on the z80. Some more instructions and even two new flags of signs were added. Among the new flags, the overflow flag is worth noting, although working with it was hardly supported. In addition, many instructions for working with byte data were accelerated. This was very significant, since on many systems with 8080 or z80 delay ticks were introduced, which, due to the presence of extra ticks on the 8080, could stretch the execution time almost twice. For example, in a domestic computer Vector, register-register type instructions were executed 8 cycles, and if there was 8085 or z80, then the same instructions would be executed only in 4 cycles. The XTHL instruction has become faster even by two cycles. With the new instructions, you can write code to copy a block of memory that runs faster than the Z80 LDI / LDD commands! However, some instructions, for example,
The 8085 has built-in interrupt support, which in many cases eliminates the need for a separate interrupt controller in the system, and a serial I / O port. As already noted, in 8085 there was no added support for the full flag overflow, so the signed arithmetic remained somewhat incomplete.
However, I can repeat the formula “for reasons unknown to the author” again. Intel refused to promote 8085 as the main processor. Only in the 80s some fairly successful 8085-based systems appeared. The first in 1981 was the predecessor and almost competitor to the IBM PC, the IBM System / 23 Datamaster. Then in 1982, a very fast computer was released with excellent Zenith Z-100 graphics, in which 8085 worked at 5 MHz. In 1983, the Japanese company Kyotronic created a very successful KC-85 knee pad, versions of which were also produced by other companies: Tandy produced the TRS-80 model 100, NEC - PC-8201a, Olivetti - M-10. In total, more than 10 million copies of such computers were released! In the USSR / RF in the early 90s, based on the domestic clone IM1821BM85A, there were attempts to improve some systems, for example, the Vector computer. Amazing
In fact, Intel has given the z80 "green". A few years later, in the battle for the 16-bit market, Intel behaved completely differently, starting a lawsuit banning sales of v20 and v30 processors in the United States. Interestingly, the mentioned processors of the Japanese company NEC could switch to full binary compatibility with 8080, which made them the fastest processors of the 8080 architecture.
Another secret from Intel is the refusal to publish an extended command system, including support for new flags. However, one of the official manufacturers of these processors has published the entire system of commands. What are the reasons for such a strange failure? One can only guess. Could Zilog then play a role that AMD had once possibly played, and made it seem like competition, while 8085 could bring down Zilog? Maybe it is a matter of wanting to keep the system of commands closer to the then-designed 8086? The latter seems doubtful. Intel 8086 was released more than 2 years after the release of 8085 and it’s hard to believe that in 1975 the system of its commands was already known. And in any case, compatibility with both 8080 and 8085 by 8086 is achievable only with the use of a macro processor, sometimes replacing one 8080/8085 command with several of its own. And the two published new instructions 8085 in 8086 are not realizable at all. It is especially difficult to explain why Intel did not publish information about new teams after the release of 8086. One can only assume that the most likely thing was marketing. Artificially worsening the 8085 specifications, a more effective 8086 was obtained against this background.
Motorola 6800 and close relatives
Motorola processors have always been distinguished by the presence of several very attractive "highlights" with the simultaneous presence of some kind of architectural solutions that are absurd in their abstractness and low practicality. The main "highlight" of all considered processors is the second full and very fast register battery.
The 6800 was the first processor in the world to require only one power supply (5 volts) - this was a very useful innovation. Ho 6800 because of the uniqueness of the cumbersome 16-bit index register for the 8-bit architecture, turned out to be inconvenient for programming and using the product. It was released back in 1974, not much later than 8080, but it never became the basis for any known computer system. Interestingly, the 6502 developers, Chuck Peddle and Bill Mensch, called the 6800 wrong, “too big.” However, he and his variants were widely used as microcontrollers. Perhaps it’s worth noting that Intel has been manufacturing processors since 1971, which put Motorola in the position of a catch-up party, for which the 6800 was the very first processor. And if you compare 6800 not with 8080, but with its predecessor 8008, then the 6800 would be much preferable. Motorola almost caught up with Intel with 68000/20/30/40. You can also note that in the 70s, Motorola was a significantly larger company than Intel.
There were also numerous 6800 variants: 6801, 6802, 6803, 6805, ... Most of them are microcontrollers with built-in memory and I / O ports. 6803 is a simplified 6801 and was used in a very late (1983) for its class computer Tandy TRS-80 MC-10 and its French clone Matra Alice, which were comparable to Commodore VIC-20 (1980) or Sinclair ZX81 (1981). The command system 6801/6803 has been significantly improved, 16-bit commands have been added, multiplication, ... There was an unusual unconditional branch instruction (BRN - branch never), which is never executed! Some instructions are a little faster.
680x fully supports signed integers, the z80 and 6502 support it worse, while the 8080 and 8085 have almost no such support. However, in 8-bit software such support was needed very rarely.
6809 was released in 1978, when the 16-bit era began in 8086, and has a highly developed instruction set, including multiplying two byte batteries to produce a 16-bit result in 11 cycles (for comparison, 8086 requires 70 cycles for such an operation) . Two batteries can in several cases be grouped into one 16-bit one, which gives fast 16-bit instructions. 6809 has two index registers and a record number of addressing methods among 8-bit processors - 12. Among addressing methods there are unique for 8-bit chips, such as an index with auto-increment or decrement, relative to the program counter, an index with offset. 6809 has the interesting ability to use two types of interrupts: you can use fast interrupts with partial automatic register saving and full register saving interrupts - the 6809 has three inputs for the interrupt signals FIRQ (fast masked), IRQ (masked), NMI (non-maskable). Also, quick instructions for reading and setting all flags at once are convenient to use.
However, memory operations require more than 6502 clock cycles. The index registers remained clumsy 16-bit dinosaurs in an 8-bit world, some operations are just shocking at their slowness, for example, sending one byte battery to another takes 6 clock cycles, and the exchange their contents - 8 cycles (compare with 8080, where the 16-bit exchange takes place in 4 cycles)! For some reason, two stack pointers are offered at once, perhaps it was the influence of the dead-end architecture of the VAX-11 - in the 8-bit architecture with 64 KB of memory it looks very awkward. And even the presence of instructions with the interesting name SEX cannot eliminate all the problems of 6809. In general, the 6809 is still somewhat faster than the 6502 at the same frequency, but it requires the same memory speed. I managed to make a division for 6809 with 32-bit divisible and 16-bit divisor (32/16 = 32, 16) for a little more than 520 cycles, for 6502 I failed to achieve less than 650 cycles. The second battery is a great advantage, but the other features of the 6502, in particular, the inverted transfer, reduce this advantage only to the 25% indicated. But the multiplication by the 16-bit constant turned out to be slower than the tabular for 6502 with a table of 768 bytes. The 6809 allows you to write fairly compact and fast codes using the addressing of an established page (direct page), but this addressing makes the codes rather confusing. The essence of this addressing is to set the high byte of the data address in a special register and specify only the low byte of the address in commands. The same system with only a fixed high byte value is used in 6502, where it is called zero page addressing (zero page). Addressing the set page is a direct analogue of using the DS segment register in x86 not only for segments of 64 KB in size, but for segments of only 256 bytes in size. Another fictitiousness of the 6800 architecture is in using byte order from high to low (Big Endian), which slows down 16-bit addition and subtraction operations. 6809 is not fully compatible with the instruction codes from the 6800. 6809 was the last 8-bit processor from Motorola, in further developments, it was decided to use 68008 instead.
It can be assumed that Motorola has spent a lot of money to promote the 6809. This is still evident at the mention of this processor. About 6809 there are many favorable reviews, which are distinguished by some nebula, generalizations and vagueness. 6809 was positioned as an 8-bit superprocessor for micro-mainframes. Even Unix was made for him, OS-9 and UniFlex operating systems. He was chosen as the main processor for the Apple Macintosh and, as follows from the films about Steve Jobs, only his emotional intervention determined the transition to a more promising 68000. Of course, 6809 is a good processor, but in general only marginally the best of its competitors that appeared much earlier 6502 (three years earlier) and z80 (two). One wonders what would happen
6809 has been used in several fairly well-known computer systems. The most famous among them is the American computer Tandy Color or Tandy Coco, as well as their British or, more precisely, Welsh clone Dragon-32/64. The computer markets of the 80s were distinguished by significant non-transparency and Tandy Coco was distributed mainly only in the USA, and the Dragons besides the UK itself gained some popularity in Spain. In France, 6809 for some reason became the basis for the mass computers of the 80s of the Thomson series, which remained virtually unknown elsewhere except in France. The 6809 was also used as a second processor in at least two systems: in the Commodore SuperPET 9000 series and in the short run and now almost forgotten prefix for the TUBE interface of BBC Micro computers. This processor was used in other systems less known to the author, in particular, Japanese ones. He also received some distribution in the world of gaming consoles. It is worth mentioning one of these consoles, Vectrex, which uses a unique technology - a vector display.
Color CoCo 3
680x have an interesting undocumented instruction with the interesting name “Stop and Fry” (Halt and Catch Fire - HCF), which is used for testing at the electronics level, for example, an oscilloscope. Its use causes the processor to freeze, from which you can exit only by restarting (reset). These processors have other undocumented instructions. In the 6800 there are, for example, instructions symmetrical to directly loading the register with a constant; instructions for immediate unloading of the register to the address following this instruction!
Like 8080, 8085, or z80 6809, it’s very difficult to call a pure 8-bit one. A 6309 is even formally difficult to call 8-bit, it was produced by the Japanese company Toshiba (I couldn’t find the exact year of its production, but there is some data indicating 1982) as a processor fully compatible with 6809. However, this processor could be switched new mode, which, while maintaining almost complete compatibility with the 6809, provided almost an order of magnitude greater opportunities. These features were hidden in the official documentation, but were published in 1988 on Usenet. Two more batteries were added, but instructions with them were significantly slower than with the first two. The execution time of most instructions is greatly reduced. Added a number of commands, among which the sign division of 32-bit divisible into 16-bit divisor (32/16 = 16,16) for 34 cycles, fantastic for processors of such a class, is divisible from memory. There is also a 16-bit multiplication with a 32-bit result in 28 clock cycles. Very useful instructions were also added for quickly copying blocks of memory with a runtime of 6 + 3n, where n is the number of bytes to be copied, you can copy with decreasing or increasing addresses. The same instructions can be used to quickly fill the memory with a given byte. When they are executed, interruptions may occur. There were still new bit operations, a null-register, and others. Interruptions were also added when executing an unknown instruction and when dividing by 0. In a sense,
The 6309 is fully terminal-compatible with the 6809, making it a popular upgrade for Tandy or Dragons in color. There are special OS versions that use the new features of 6309.
MOS Technology 6502 and WDC 65816
This is a processor with a very dramatic fate. No other processor can compare with it. His appearance and introduction was accompanied by very large events in scope and consequences. I will list some of them:
- the weakening of the giant Motorola, which for some time exceeded the capabilities of Intel;
- destruction of the company MOS Technology;
- cessation of development 6502 and its stagnation release with little or no modernization.
It all started with the fact that, for unknown reasons, Motorola refused to support the initiative young engineers who offered to improve the rather mediocre 6800 processor in general. They had to leave the company and continue their work in a small but promising company MOS Technology, where they soon prepared two processors 6501 and 6502, made by NMOS technology. The first one was compatible with the 6800 connector, but otherwise they were identical. The team 6501/6502 was able to successfully introduce new chip production technology, which radically reduced the price of new processors. In 1975, MOS Technology could offer 6502 for $ 25, while the starting price for the Intel 8080 and Motorola 6800 was 1974 for $ 360. In 1975, Motorola and Intel lowered prices, but they were still close to $ 100. MOS Technology experts claimed that their processor is up to 4 times faster than the 6800. It seems doubtful to me: the 6502 can work much faster with memory, but the second 6800 battery greatly accelerated many calculations. Estimated I can assume that the 6502 was on average faster no more than 2 times. Motorola launched a lawsuit against its former employees - they allegedly used many of the company's technological secrets. During the process, it was possible to establish that one of the engineers who had left Motorola issued some confidential documents on the 6800, acting contrary to the attitudes of his colleagues. Whether it was his own act or there were still some guiding forces behind him is still unknown. For this and other unclear reasons, Motorola forced MOS Technology, whose financial capabilities were very small, to pay a substantial amount of $ 200,000 and to abandon the production of 6501. Intel in a similar situation with Zilog acted quite differently. Although it must be admitted that MOS Technology was sometimes too risky when trying to use the big money that Motorola spent on promoting the 6800 for its own purposes.
Further, the legendary Commodore firm and its no less legendary founder Jack Tramiel appear in the story, in the shadow of which was the figure of the chief financier of the firm who determines its policy - a man named Irving Gould. Jack got a loan from Irving and with this money, using several, to say the least, unfair tactics, forced MOS Tecchnology to become part of the Commodore. After that, perhaps against the wishes of Tramel, who was forced to give in to Hood, the development of the 6502 almost stopped, despite the fact that as early as 1976 it was possible to produce prototypes of the 6502 with operating frequencies up to 10 MHz, although the message appeared only after many years from named Bill Mensch (he was on the team that left Motorola), who often made loud, but by and large empty statements and played a rather ambiguous role in the fate of 6502. The main 6502 developer Chuck Peddl was forever removed from the development of processors. 6502 continued to be produced not only at Commodore, but also at the firm created by Bill Mensch, the Western Design Center (WDC). It is curious that none of the former team 6502 worked with him in the future.
At this drama around 6502 is not over. In 1980, a short, anonymous article appeared in Rockwell's AIM65 Interactive magazine stating that all 6502 carry a dangerous bug called JMP (xxFF). The tone of the article suggests something completely out of the ordinary. Subsequently, this attitude moved to the position of Apple on this issue and became a kind of mainstream. Although there was no “bug,” strictly speaking. Of course, a specialist accustomed to the comfortable processors of large systems of those years, one of the features that are quite relevant and even useful among microprocessors, could seem something annoying, a bug. But in fact, this, which hurt someone's feelings, the behavior was described in the official documentation from 1976 and in the textbooks on programming, published before the appearance of the mentioned article. The "bug" was eliminated by Bill Mensch, made 65C02 (CMOS 6502) supposedly by 1983, that is, after the release of 65816. While Intel, Motorola and others have already made 16-bit processors of new generations, 6502 was only microscopically improved and made artificially partially incompatible with itself . In addition to eliminating the “bug,” a number of changes were made, which, in particular, led to a change in the course of executing several instructions that became slower in tact, but at the same time they became more correct in some far-fetched academic sense. But, it must be admitted that several new instructions turned out to be expected and useful. On the other hand, the absolute majority of new instructions only occupied the code space, adding almost nothing to the capabilities of the 6502, which left fewer new codes for possible further upgrades. Commodore and Japanese Ricoh (manufacturer of the most popular game consoles NES) did not accept these changes. The author of this material himself faced several times the problem of this “bug”. Knowing nothing about him, he wrote programs for the Commodores. Then one of them was transferred to systems where the 65C02 command set was used. There was an incompatibility, I had to change the codes, do a conditional compilation. The code for the 65S02 turned out to be more cumbersome and slow. Then he raised this question on the forum 6502.org, where the majority of participants from the world of Apple. He asked if anyone could give an example when the aforesaid “bug” crashed the program. Received only emotional and general comments, a specific example was never offered. Knowing nothing about him, he wrote programs for the Commodores. Then one of them was transferred to systems where the 65C02 command set was used. There was an incompatibility, I had to change the codes, do a conditional compilation. The code for the 65S02 turned out to be more cumbersome and slow. Then he raised this question on the forum 6502.org, where the majority of participants from the world of Apple. He asked if anyone could give an example when the aforesaid “bug” crashed the program. Received only emotional and general comments, a specific example was never offered. Knowing nothing about him, he wrote programs for the Commodores. Then one of them was transferred to systems where the 65C02 command set was used. There was an incompatibility, I had to change the codes, do a conditional compilation. The code for the 65S02 turned out to be more cumbersome and slow. Then he raised this question on the forum 6502.org, where the majority of participants from the world of Apple. He asked if anyone could give an example when the aforesaid “bug” crashed the program. Received only emotional and general comments, a specific example was never offered. where most of the participants are from the Apple world. He asked if anyone could give an example when the aforesaid “bug” crashed the program. Received only emotional and general comments, a specific example was never offered. where most of the participants are from the Apple world. He asked if anyone could give an example when the aforesaid “bug” crashed the program. Received only emotional and general comments, a specific example was never offered.
Bug !!!
65C02 was licensed to many firms, in particular, NCR, GTE, Rockwell, Synertek and Sanyo. Used in the Apple II, starting with the IIe models, although many of the IIe used the NMOS 6502. The 65C02 6512 variant was also used in the later BBC Micro models. Atari used NMOS 6502. In addition to CMOS 6502, Synertek and Rockwell also produced NMOS 6502. By the way, NMOS 6502 has its own set of undocumented instructions, the nature of which is completely different from the “secret” commands of 8085. In 6502, these instructions appeared as a side effect of the technology used, therefore most of them are rather useless, but several, for example, loading or unloading two registers with one command at once, and some others can make the code faster and more compact.
There were other attempts to modernize 6502. In the same 1979, an article appeared that for the Atari computers the 6509 processor was prepared for production (not to be confused with the later processor with the same Commodore name), in which command execution acceleration by 25% and many new ones were expected instructions. But for unknown reasons, exactly the production of this processor did not take place. Commodore did only microscopic upgrades. There, in particular, they switched to the HMOS technology and the manufacture of static cores, which made it possible to slow down the processors. From the point of view of programming, the most interesting is the processor 6509, which, albeit in a very primitive form, with the help of only two instructions specially allocated for this purpose allows us to address up to 1 MB of memory. In the super-popular Commodores 64 and 128, there were 6510/8510 processors, and in the less fortunate 264 series - 7501/8501. These processors had only 6 and 7 embedded bit I / O ports, respectively, while 7501/8501 did not support non-masked interrupts. Rockwell produced the 65C02 version with its extended 32 bit operations (similar to the z80 bit instructions) instruction set, however, as far as I know, such processors were not used in computers and these bit instructions themselves were more likely to be used only in embedded systems. This extension, by the way, was produced by Bill Mensch. Rockwell produced the 65C02 version with its extended 32 bit operations (similar to the z80 bit instructions) instruction set, however, as far as I know, such processors were not used in computers and these bit instructions themselves were more likely to be used only in embedded systems. This extension, by the way, was produced by Bill Mensch. Rockwell produced the 65C02 version with its extended 32 bit operations (similar to the z80 bit instructions) instruction set, however, as far as I know, such processors were not used in computers and these bit instructions themselves were more likely to be used only in embedded systems. This extension, by the way, was produced by Bill Mensch.
The last scene of the drama with the participation of 6502 was designated in the prevention of computers based on 6502 with a frequency of 2 MHz on the US market in the first half of the 80s. It touched a BBC Micro Englishman, their manufacturing company Acorn made a large batch of computers for the United States, but as it turned out, it was in vain. Some kind of blocking worked and the computers had to be urgently reworked to fit European standards. Semi-American, but formally Canadian computers Commodore CBM II (1982), despite some problems (in particular, on compliance with the standards for electrical equipment), were still admitted. Perhaps due to the fact that they did not have graphic modes and even colored text - even a stylish Porsche design could not compensate for this. The latest in the list of losers was 100% American Apple III (1980) - it is known that Steve Jobs, like the management of Apple as a whole, they did a lot to prevent this computer from taking place. Jobs demanded obviously impracticable specifications, and management - unrealistic deadlines. Do we ever know their motives? Apple III Plus, released in 1983, managed to eliminate the flaws of the Apple III, but Apple’s management quietly closed the project in 1984 because of a reluctance to compete with the Macintosh computer. Only in 1985, when the era of 8-bit technology began to go away, did the Commodore 128 appear, which could be used in one of its 6502 modes with a 2 MHz clock cycle. But even here it turned out to be more of a joke, since this mode was practically not supported and there are practically no programs for it. Only in the second half of the 80s in the United States began to produce consoles-accelerators for the Apple II, and since 1988 the Apple IIc + model with a 4 MHz processor. Why did it happen so? Perhaps because that 6502 at 2 or 3 MHz (and these were already produced at the very beginning of the 80s) could successfully compete with games based on Intel 8088 or Motorola 68000 on a number of tasks and in particular games. In 1991, with a strong will, Commodore closed the interesting, though and the late project C65 based on the 4510 processor with a frequency of 3.54 MHz. The 4510 is the fastest 6502, made only in 1988, it finally carried out the previously mentioned optimization of cycles, which gave a 25% increase in speed. Thus, the processor in C65 is close in speed to systems from 6502 to 4.5 MHz. Surprisingly, this fastest 6502 with an extended set of instructions (in some detail this extension turned out to be more successful than in 65816) has never been used anywhere else.
C128 and Apple III Plus had a memory management unit (MMU), which allowed using several stacks and zero pages, addressing more than 64 KB of memory, etc. In the C128, the MMU was artificially trimmed to work with only 128 KB of memory. For the BBC Micro, consoles were produced from 6502 at 3 MHz (1984) and 4 MHz (1986).
Anti-advertising - Multiple Porsche PET in the villain of Pearl of the Nile villain (1985) - the era of "Apple alone" in Hollywood has not yet come
Now a few words about the command system 6502. The main feature of this processor is that it was made almost as fast as possible, almost without unnecessary cycles, which are especially numerous in the 8080/8085 / z80 / 8088/68000 processors. In fact, this was the ideology that emerged later and under the direct influence of 6502 processors of the RISC architecture. The same ideology dominates, starting from 80486, and among Intel processors. In addition, the 6502 responded quickly to interrupts as quickly as possible, which made it very useful in some embedded systems. In 6502 one battery and two index registers, in addition, the first 256 bytes of memory can be used in special commands either as faster memory or as a set of 16-bit registers (which are almost identical in functionality to the BC and DE registers in 8080 / z80) for pretty powerful ways to address. Some arithmetic commands (shifts, rotation, increment and decrement) can be used with memory directly, without using registers. There are no 16-bit instructions - this is a 100% 8-bit processor. All major flags are supported except the characteristic Intel parity flag architecture. There are some more unusual flag of the low-useful 10th mode. Intel and Motorola processors use special correction instructions for working with decimal numbers, and 6502 can switch to the 10th mode, which makes its speed advantage with 10 numbers even more significant than with binary ones. The presence for 6502 tabular multiplication of 8-bit operands with getting a 16-bit result in less than 30 clock cycles, with an auxiliary table size of 2048 bytes, is very impressive.
6502 can work in parallel with another device, for example, another 6502. As far as I know, such dual-processor systems have never been produced. Instead of the second processor, a video controller was usually used, which used a common memory with 6502.
65816 was released by WDC in 1983. Interestingly, the specification of the new processor Bill Mensch received from Apple. Of course, it was a big step forward, but clearly belated and with great architectural flaws. 65816 was no longer considered by anyone as a competitor for the main processors of Intel or Motorola - this was already a minor outsider who was already somehow programmed to further lose positions. 65816 had two important advantages - it was relatively cheap and almost compatible with the still very popular 6502. In the following years, Bill Mensch didn’t even try to somehow improve his brainchild, optimize the cycles, replace the zero page addressing using the Z register ( this was done in 4510), add at least multiplication, ... WDC only increased the limiting clock frequencies, reaching the mid-90s to 14 MHz (such a processor was used in the popular accelerator for the C64 SuperCPU at a frequency of 20 MHz). However, even now (2019!) WDC offers 65816 for some reason only at the same 14 MHz. 65816 can use up to 16 MB of memory, but the addressing methods used for this look far from optimal. For example, the index registers can only be 8- or 16-bit, the stack can be placed only in the first 64 KB of memory, only there you can use the convenient short addressing of the set page (direct page - generalization zero page), working with memory higher than 64 KB comparatively clumsy, ... 65816 has a 16-bit ALU, but an 8-bit data bus, so it is only about 50% faster than 6502 in arithmetic operations. Nevertheless, 65816 was released in more than a billion. Of course, A number of 65816 commands clearly complement the gaps in the 6502 architecture, for example, mass memory copy commands for 7 clock cycles per byte. You can also add that 65816 uses almost all instruction codes (255 out of 256). The last unused code is for long future instructions that have not yet appeared.
Apple IIx, in the development of which Steve Wozniak was actively involved, was supposed to use 65816, but it was possible to start production of this processor only in 1984 and the first batches of 65816 were defective, which caused excessive delays and eventually the closure of the entire project.
There is another version 65816 65802, which uses a 16-bit address bus and is compatible with the 6502 connector. Upgrades were offered for the Apple II based on this processor, but a slight acceleration with such an upgrade can be obtained only on specially written programs.
6502 was used in a large number of computer systems, the most popular of which are 8-bit Commodore, Atari, Apple, NES. Interestingly, 6502 was used as a keyboard controller in the Commodore Amiga computer, and two 6502 at 10 MHz were used in the high-performance Apple Macintosh IIfx. Here it is impossible not to mention more Atari game consoles, produced from 1977 to 1996, - about 35 million of them were sold! 65816 was used in the rather popular Apple IIgs computer, in the Super NES gaming console, and also in the rare English Acorn Communicator computer.
In 1984 an article appeared in the magazine Byte against the background of pictures with red banners, Lenin and marching soldiers .about a bad copy of an Apple computer] [made in the USSR. This article cited a curious price for this computer - $ 17,000 (this is an absurd number, the real price was about 4,000 rubles) and ironically indicated that Soviet manufacturers would have to dramatically lower the price if they want to sell their product in the West. Agate was used mainly in school education. Older Agate models were almost 100% compatible with Apple] [and had some pretty useful extensions.
One can only try to fantasize about what would have happened if 6502 could develop at the same pace as its competitors. It seems to me that the gradual transfer of zero page memory to registers and the gradual expansion of the instruction set with simultaneous optimization of cycles would allow the "terminator" 6502 to remain at the top in speed until the early 90s. Introducing mode 16 and then 32 bits would allow more memory and faster commands to be used. Would his competitors be able to oppose this?
I would like to end with some general philosophical arguments. Why was the 6502 slowed down and lacking a much brighter future? Perhaps due to the fact that he really could very much press large firms and create a completely new reality. But was the 6502 team set up for this? More likely not, they just wanted to make a better processor.
Zilog Z80
This processor has become, along with 6502, the main processor of the first personal computers. In the history of its appearance and use there are no dramatic events. There is only some intrigue in the failure of Zilog to make the next generation of processors. Z80 began to be produced in 1976 and its variants are still produced. Once even Bill Gates himself announced support for z80-based systems.
A number of coincidences are interesting. As in the case of the 6502, the main developer of the Z80, Federico Faggin (Federico Faggin), left a large company, from Intel. After working on the z80, Federico almost didn’t work with the next-generation Z8000 processor and left the company he founded in the early 80s so that he wouldn’t be involved in processors. He then created several relatively successful startups, creating communications systems, touchpads and digital cameras. It can be mentioned that, in addition to the z80, in the Zilog they had also developed a successful and still-produced Z8 microcontroller.
The Z80 is a more convenient processor for inclusion in computer systems than the 8080. It requires only one supply voltage and has built-in support for dynamic memory regeneration. In addition, with full compatibility with the 8080, it has quite a few new commands, a second set of core registers, and several completely new registers. It is curious that Zilog abandoned the use of assembler 8080 mnemonics, and began to use its own mnemonics, more suitable for the advanced z80 command system. A similar story happened with the Intel x86 assembler in the GNU software world, for some reason, there too, they use their own conventions for writing programs in assembler by default. In the Z80 added support for the flag overflow, Intel added support for such a flag only in 8086. However, This flag in z80 was combined with the parity flag, so at the same time, as in 8086, both flags cannot be used. In z80, as in 6502, there is only a basic check of the value of one flag, i.e. there are no checks of two or three flags at once, which is necessary for comparisons of “strictly more”, “less or equal”, as well as all iconic ones - in such cases you have to do several checks, on 8086, 6800 or PDP-11 one is enough.
Among the new commands, the z80 is especially impressive with the massive memory copy commands for 21 clock cycles per byte, as well as the interesting byte search command in memory. However, the most interesting is the EXX command, which swaps the contents of 48 bytes of register memory, the BC, DE, HL registers with their counterparts, which runs in just 4 clock cycles! Even a 32-bit ARM will need at least 6 clock cycles for the same operation. The remaining additional instructions are not as impressive, although they can sometimes be helpful. Added more:
- 16-bit subtraction with a loan and 16-bit addition with transfer in 15 cycles;
- unary minus for the battery in 8 cycles;
- the ability to read from memory and write to it using the registers BC, DE, SP, IX, IY, and not just HL;
- shifts, rotations and input-output for all 8-bit registers;
- the operation of checking, setting and resetting a bit by its number;
- shift transitions (JR);
- loop command.
Most of the new commands are rather slow, but their proper use can still make the code somewhat faster and much more compact. This particularly applies to the use of new 16-bit registers IX and IY, which can be used for new addressing methods. Interestingly, the index registers IX and IY appeared in the Z80 in order to attract 6800 users to the transition to the Z80! But I dare to express my opinion, operations with index registers Z80 are made rather unsuccessfully due to the presence of an almost useless byte offset in commands using these registers.
Many of the 8080 teams in the z80 have become faster and this is a very noticeable acceleration. But the ADD command, which is the main one for 16-bit arithmetic, has become slower, which makes arithmetic in general, if only faster, then only slightly.
The interrupt system has become much more interesting than the 8080. With the z80, you can use both unmaskable interrupts and three methods (one of them compatible with the 8080) for working with masked ones. The most interesting mode of masked interrupts 2, which allows the flexibility to change the address of the code to handle the interrupt.
The Z80 has quite a few undocumented instructions, many of these instructions disappeared during the transition to CMOS technology, but those that have survived have actually become standard and have been documented by some firms. Particularly useful instructions allow you to work with individual bytes of clumsy 16-bit registers IX and IY. In addition to undocumented instructions, the Z80 has other undocumented properties, for example, two special flags in the state register.
Of course, z80 is even more so than 8080 has the right to be called a slightly 16-bit. The hypothetical index of the bitiness of the z80 is clearly a bit higher, but it’s paradoxical that the ALU of the z80 is actually 4-bit! At the electronic level, the z80 and 8080 are completely different chips.
Much has been written about the comparison of the speed of the z80 and 6502, since these processors are very widely used in the first mass computers. In this topic there are several difficult moments, without an understanding of which it is very difficult to maintain objectivity. Due to the presence of a fairly large number of registers z80 is natural to use at a frequency greater than the memory works. Therefore, the z80 at 4 MHz can use the same memory as the 6502 or 6809 at 1.3 MHz. According to many experienced programmers who wrote codes for both processors, on the same frequency 6502, on average, about 2.4 to 2.6 times faster than the z80. The author of this material agrees. You only need to add that writing good, fast codes for z80 is very difficult, you need to constantly optimize the use of registers, and to use the stack as much as possible to work with memory. If you try hard, in my opinion, you can reduce the difference between the z80 and 6502 to about 2.2 times. And if you do not try and ignore the timings, then you can easily get the difference up to 4 times. In some individual cases, the z80 may show very fast performance. On the task of filling in the z80 memory, using the PUSH command, it may even be slightly faster than 6502, but this is at the cost of disabling interrupts. On copying memory blocks, the z80 is only 1.5 times slower. It is especially impressive that in dividing a 32-bit dividend by a 16-divider, the z80 is only 1.7 times slower. By the way, such a super-division was implemented by a Russian programmer. Thus, we find that the ZX Spectrum c z80 at 3.5 MHz is about one and a half times faster than the C64 from 6502 to 1 MHz. It is worth noting that part of the cycles in most systems with z80 or 6502 is taken away from the processor by video signal support circuits, for example, because of this, the popular processor Amstrad CPC / PCW has a real processor frequency of 3.2 MHz rather than full 4. On 6502 systems, you can usually turn off the screen for maximum processor performance. If we take as a basis the frequency of memory, rather than the processor, then it turns out that z80 is 25-40% faster than 6502. The last result can be illustrated by the fact that with memory at 2 MHz, z80 can operate at up to 6 MHz, and 6502 only up to 2 MHz.
Z80 was used in a very large number of computer systems. In the United States were very popular Tandy TRS-80, in Europe - ZX Spectrum, and later Amstrad CPC and PCW. It is curious that Amstrad PCW computers remained relevant until the mid-90s and they were massively and actively used as intended until the end of the 90s. In Japan and other countries, quite successful MSX computers were produced worldwide. Quite a popular C128 could also use the z80, but here the users were more likely embarrassed - this late, 1985 release, 8-bit z80 computer, officially clocked 2 MHz, really works only at 1.6 MHz. This is even slower than the first systems in the 8080 mid 70s. The range of computers for using the CP / M operating system has at least three dozen fairly well-known systems.
Such a PC looked decent even in the mid-90s, but its z80 is slower than the ZX Spectrum
The fastest z80-based computer system known to the author is the BBC Micro with a TUBE prefix with a z80B at 6 MHz, released from 1984. Processor This system operates at full speed, as they say, "without brakes." Similar consoles were made for Apple] [since 1979. Later, some of these consoles used Z80H at 8 MHz and even higher. Interestingly, Microsoft in 1980 received the greatest profit from the sale of such consoles. You can also mention the Amstrad PCW16, produced since 1994, which used the CMOS Z80 at a frequency of 16 MHz.
In Japan, the MSX TurboR (1990) produced an R800 processor compatible with the z80. A 16-bit hardware multiplication with a 32-bit result has been added to the R800. Although multiplying by a 16-bit constant, table multiplication with a table of 768 bytes is obtained per clock faster. It is believed that the R800 is a heavily simplified Z800, running at four times the frequency of the bus, which is about 7.16 MHz. Thus, the internal frequency of the R800 is approximately 28.64 MHz!
The Zilog company itself did work on improving the Z80 very inconsistently and extremely slowly. The first Z80 worked at frequencies up to 2.5 MHz, the soon-to-be-reached Z80A limit frequency was increased to 4 MHz - these processors became the basis for most popular computers using the Z80. Z80B appeared by 1980, but was used relatively rarely, for example, in the mentioned BBC Micro console or in the late (1989) Sam Coupé computer. Z80H appeared by the mid-80s and could operate at frequencies up to 8 MHz - it was not used in known computers. Interestingly, Zilog products had special traps on the chip for those who tried to make copies of them, for example, the base Z80 had 9 traps and they, according to reviews of those who did it, slowed down the copying process for almost a year.
A deeper upgrade to the z80 was hampered by the desire of Zilog to create processors that are competitive with 16-bit Intel processors. In 1978, a little later, the 8086 Z8000 was released, not compatible with the z80. This processor was unable to withstand competitors from Intel and, especially, Motorola - 68000 surpassed the Z8000 in almost all parameters, although the Z8000 was used in about a dozen different low-cost systems, as a rule, for working with Unix variants. Interestingly, IBM did not even consider the Z8000 as a possible processor for the IBM PC, since Zilog was funded by Exxon, which was going to compete with IBM. Perhaps due to the failure of the Z8000, the Zilog became an Exxon division by 1980. There was also an attempt to create a competitive 32-bit processor. In 1986, the Z80000 appeared, compatible with the Z8000, which is nowhereand found no use.
One can only wonder why Zilog abandoned its approach, which showed super-successful results with the Z80, namely, to make processors software-compatible with Intel processors, but the best of them are completely different at the hardware level. Subsequently, this approach was successfully used by many firms, in particular, AMD, Cyrix, VIA.
Creating a new processor based on the Z80 was postponed until 1985, when the Z800 was made. However, then the main efforts of Zilog were directed at the Z80000 and the Z800 was released very little. In 1986, after the failure of the z80000, the Z280 was released, a slightly improved version of the Z800, which, in particular, could work on the internal frequency several times greater than the bus frequency - this new product brought major success to the Intel 486DX2 and 486DX4 processors. The Z280 had other promising features, which were then successfully applied by other firms. But perhaps because of poor performance - the Z280, despite many technological innovations, could only use relatively low clock speeds, this processor is also nowheredid not find application. It is believed that the Z280 roughly matched the capabilities of the Intel 80286, but was significantly, at least 50% slower when using the same clock speed as used with 80286. Perhaps, if the Z280 appeared 5 years earlier, it could be very successful.
The greatest success was achieved thanks to cooperation with the Japanese company Hitachi, which in 1985 released its super-Z80, HD64180, similar in capabilities to the Intel 80186, which allowed using 512 KB of memory, added a dozen new instructions, but at the same time some almost standard undocumented Z80 instructions not supported. HD64180 has been used in some computer systems. Zilog received a license for HD64180 and began to produce them with the marking Z64180. Zilog managed to slightly improve this processor, in particular, to add support for working with 1 MB of memory and release it by the end of 1986. This new processor was named Z180 and became the basis for a family of processors and controllers, with clock frequencies up to 33 MHz. It was used in some rare MSX2 computers, but more as a controller. Curious, that the Z280 and Z180 appeared in the same year, like their approximate counterparts 80286 and 80186 four years before that. In 1994, a 32-bit Z380 was made on the basis of the Z180, retaining compatibility with the z80 and roughly corresponding to the capabilities of the Intel 80386 or Motorola 68020 - Zilog showed a lag behind competitors in almost 10 years. Already in the 21st century, again on the basis of the Z180, very successful eZ80 processor-controllers are produced with timings almost like in 6502. They are used in various equipment, in particular, in network cards, DVD-drives, calculators, ...
Texas Instruments TMS9900
For this very special processor codes I did not write. And this is the first 16-bit processor available for use in personal computers. It has been produced since 1976. Uses a much less common order of bytes from high to low (Big Endian). This order is used only in Motorola processors of the 6800 and 68000 series and in the architecture of the giant IBM / 370. All other processors of this review use the reverse order of bytes (Little Endian).
The TMS9900 has only three 16-bit registers: a command counter, a state register, and a pseudo-register base register. This processor uses 32 bytes of allocated memory as 16 two-byte registers. Such memory usage is somewhat similar to the zero page of memory in the 6502 architecture. Using the base register, the TMS9900 can very quickly change the context. This is reminiscent of the Z80, which has two register contexts. The system of flags is distinguished by its distinctiveness, along with the typical flags of transference, zero (equality), overflow, parity, there are two more unique flags of logical and arithmetic signs less. Working with the stack and routines is reminiscent of future RISC processors. There is simply no ready stack, it can be done using one of the pseudo-registers. When the subprogram is called, a new value is selected for the counter and the base, and all three registers are stored in the pseudo-registers of the new context. Thus, a subroutine call is more like a program interrupt. The TMS9900 has a built-in interrupt controller, designed to work with hardware interrupts of up to 16.
The first 16-bit home computer - it even has color sprites.
The command system looks very impressive. There is even multiplication and division. The unique X instruction allows you to execute one instruction at any address in memory and move on to the next. The execution of commands is rather slow, the fastest instructions are executed only in 8 cycles, the arithmetic commands are in 14, but the multiplication (16 * 16 = 32) in 52 cycles and especially division (32/16 = 16.16) in 124 cycles were probably a record fastest among the 70s processors.
The TMS9900 requires three supply voltages of -5, 5, and 12 volts, and the four phases of the clock signal — these are anti-records among the processors I know of. In 1979, this processor was demonstrated to IBM specialists, who were then looking for a processor for an IBM PC prototype under development. The obvious drawbacks of the TMS9900 (addressability only 64 KB of memory, lack of controller architecture required, relative slowness) made a corresponding impression and Intel 8088 was chosen for the future leader among PCs. To eliminate the problem of lack of controllers, Texas Instruments also produced the TMS9900 version with 8-bit bus -9980, which worked 33% slower.
The TMS9900 was used in the fairly popular US computers TI99 / 4 and TI99 / 4A, which were “defeated” in the price war by a Commodore VIC-20 computer by 1983. It is curious that as a result of this war, Texas Instruments was forced to reduce the price of its computer to incredible for 1983 $ 49 (in 1979 the price was $ 1,150!) and sell them with a big loss for yourself. For example, a relatively unpopular Commodore + 4 computer, which was ceased to be produced in 1986, but the prices for which fell to those $ 49 only in 1989. T99 / 4A ceased to be produced in 1984, when, due to ultra-low prices, it began to gain popularity. This computer can only conditionally be called 16-bit. Because it has only 256 bytes (!) Of RAM memory and all ROM memory addressed via a 16-bit bus. The remaining memory and I / O devices work through a slow 8-bit bus. Therefore, it is possible to more correctly consider the domestic BK-0010 as the first home 16-bit computer. It is curious that TI99 / 4 and TI99 / 4A use a processor at a frequency of 3 MHz - exactly the same as the BK-0010.
The TI-99/4 and TI99 / 4A used a rather successful TMS9918 chip as a video controller, which became the basis for the very popular MSX standard around the world, as well as some other computers and gaming consoles. In the Japanese company Yamaha, this video chip was significantly improved and subsequently used, in particular, to upgrade the TI-99/4 and TI99 / 4A themselves!
The TI99 / 4 series is a rare example of computers where the manufacturer of the processor and the computer were the same.
DEC PDP-11 processors
Since the beginning of the 1970s, the 10-year era of DEC domination has begun in the world. DEC computers were significantly cheaper than those manufactured by IBM and therefore attracted the attention of small organizations for which IBM systems were not available. The era of mass professional programming also begins with these computers. The PDP-11 series of computers has been very successful. Various PDP-11 models were manufactured from the early 70s to the early 90s. In the USSR, they were successfully cloned and became the first mass popular computer systems. Among the clones are computers with the names of SM computers, Electronics-60/81/85, DVK-1/2/3, BK-0010/0011 (BK0010 is the first PC that it became possible to buy in the store).
However, DEC also promoted the more expensive and complex computers of the VAX-11 family, the situation around which was somewhat politicized. And since the second half of the 70s, DEC practically halted development in the PDP-11 line, in particular, support for 16 numbers for the assembler has not been introduced. The speed of the PDP-11 systems has also remained almost unchanged since the mid-70s.
PDP-11 used different, compatible on the main command system processors, for example, LSI-11, F-11, J-11. In the late 70s, DEC made a cheap processor for T-11 microcomputers. However, for unclear reasons, despite the seemingly large and high-quality software that could be transferred to the system using it in the future, it was not noticed by the manufacturers of computer systems. The only exception was one model of Atari game console. T-11 has found a massive use only in the world of embedded equipment, although in its capabilities it was rather slightly superior to the z80. In the USSR, K1801VM1, K1801BM2, K1801BM3, ... processors were manufactured close to the DEC processors, as well as exact copies of the DEC processors. The latter were much more expensive and were produced in small quantities.
The PDP-11 processor command system is distinguished by almost complete orthogonality, pleasant quality, but when taken to the extreme, it can create ridiculous commands. The PDP-11 processor instruction set has influenced many architectures, and especially the Motorola 68000.
The system of commands PDP-11 is strictly 16-bit. All 8 general-purpose registers (and the command counter in this architecture is the usual R7 register) are 16-bit, the state register (contains typical flags) is 16-bit too, the size of commands is from 1 to 3 16-bit words. Each operand in a command can be (although there are exceptions, for example, XOR instruction) of any type - this is orthogonality. Among the types are ordinary register or memory. Programmers in the 80s sometimes did not understand why there are no memory-memory commands in the Intel x86 command system. This is the influence of the PDP-11 school, where you can easily write complete addresses for each operand. This, of course, is slow and especially slow for systems with slow memory typical from the beginning of the 90s. Memory can be accessed through a register, a register with an offset, a register with auto-decrements or increments.
MOV @(R0)+,@-(R1)
means the same as the C / C ++ operator
**–r1 = **r1++;
where r0 and r1 are declared as
signed short **r0, **r1;
Another example is the command.
MOVB @11(R2),@-20(R3)
corresponds to
**(r3-20) = **(r2+11);
where r2 and r3 are declared as
char **r2, **r3;
In the currently popular architectures, one team in such cases is not enough, perhaps you will need at least 10 commands. You can also get the address relative to the current value of the command counter. Let me give you another example with more simple addressing. Team
ADD #16,11(R4)
in Intel x86 architecture can be mapped
ADD [BX+11],16
In DEC assemblers, it is customary to write operands from left to right, unlike Intel, where they write from right to left. There is reason to believe that GNU assembler for x86 was made under the influence of the PDP-11 assembler.
The multiplication and division commands are only signed and not available on all processors. Arithmetic of decimal numbers is also optional - this is the so-called commercial arithmetic in DEC terminology. As a curiosity of complete orthogonality, I will give an example of the command
MOV #11,#22
which after execution turns into
MOV #11,#11
- This is an example of using a direct constant as an operand. Another funny command is a unique MARK command, the code of which must be put on the stack and can never be used explicitly. The subprogram call in the PDP-11 architecture is also somewhat peculiar. The corresponding command first saves the allocated register (can be any) on the stack, then saves the command counter in this register and only then writes a new value to the command counter. The return command from the subroutine must do the opposite and know which register was used when calling the subroutine. Very strange and unpredictable effects can be obtained using the instruction counter as a normal register.
It is curious that among the programmers on the PDP-11 there was a culture of working directly with machine codes. Programmers could, for example, work without a disassembler when debugging. Or even write small programs directly into memory, without assembling!
Of course, command timings do not differ too high speed characteristics. It was surprising once to find out that on the domestic computer of the BC the transfer command from register to register takes as many as 12 clocks (10 clocks using the code from ROM), and commands with two operands with double indirect addressing are executed in more than 100 clocks. The Z80 makes a 16-bit register transfer for 8. However, the slowness of the BC is caused not so much by the processor as by the poor quality of the domestic memory, for the features of which the BC had to be adapted. If sufficiently fast memory were used, then the BC sent 16 register bits per 8 clock cycles too. Once there was a lot of controversy, which is faster than BC or Spectrum? Immediately I must say that the Spectrum is when using the top 32 KB of memory one of the fastest mass 8-bit personal computers. Therefore, it is not surprising that the Spectrum is faster than the BC, but not by much. And if the BC worked with memory without brakes, then it would probably be a little faster.
Code density is also a rather weak point of the PDP-11 architecture. The instruction codes must be multiples of the length of the machine word - 2 bytes, which is especially unpleasant when working with byte arguments or simple commands like setting or resetting the flag.
Attempts to make a personal computer based on the PDP-11 architecture are interesting. One of the first PCs in the world, appearing only a little later than Apple] [and Commodore PET and, most likely, a bit earlier than Tandy TRS-80, was the Terak 8510 / a, which had black and white graphics and the ability to download an incomplete version of Unix. This PC was quite expensive and, as far as I know, was used only in the US higher education system. Since 1978, the computer has been produced as a Heathkit H11 assembly kit. DEC itself also tried to make its PC, but very inconsistently. DEC, for example, released a PC based on the z80 and 8088, obviously playing rather against its main developments. Personal computers based on the PDP-11 DEC PRO-325/350/380 architecture had some rather artificial incompatibilities with the basic architecture, which made it difficult to use part of the software. Best of all, the personalization of the technology of mini-computers turned out in the USSR, where the BC, DCK, UK NC, was produced ... By the way, Electronics-85 is a fairly accurate clone DEC PRO-350. In addition, the CP1600 processor, akin to the PDP-11 architecture, was used in the Intellivision gaming consoles popular in the early 80s.
Domestic 16-bit home computer (1985) - almost PDP-11 compatible
K1802VM2, which was used in DVK, approximately twice as fast as K1801BM1, K1801BM3 even faster and close in speed to Intel 8086.
In older PDP-11 models and those close to them computers processor can address up to 4 MB of memory, but one program can be allocated no more than 64 KB. In terms of speed, these processors are also close to 8086 in terms of the number of operations per megahertz, although it is still slower.
Processor for DEC VAX-11
VAX-11 systems were quite popular in the 80s, especially in higher education. Now it is difficult to understand some of the concepts described in the books of those years, without knowing the features of the architecture of these systems. VAX-11s were more expensive than PDP-11s, but more oriented towards universal programming and still significantly cheaper than IBM / 370 systems. For the VAX architecture, a V-11 processor was made by the mid-80s, and before that time, processor assemblies were used.
The VAX-11 architecture is 32-bit, it uses 16 registers, among which, like the PDP-11, there is a command counter. It assumes the use of two stacks, one of which is used to store the frame-frames of the subroutines. In addition, one of the registers is assigned to work with the arguments of called functions. Thus, for stacks allocated 3 of the 16 registers. The command system of the VAX-11 cannot fail to amaze with its vastness and the presence of very rare and often unique commands, for example, for working with bit fields or several types of queues, for calculating CRC, multiplying 10 lines, ... Many commands are like in triaddress variants (as ARM), and in two-address (as x86), but there are also four-address commands, for example, the extended division of EDIV. Of course there is support for working with real numbers.
But the VAX-11 is a very slow system for its class and price. Even the super-simple 6502 at 4 MHz could outrun the slowest VAX-11/30 family, and the fastest VAX-11 systems, huge cabinets and “whole furniture sets,” are at the same level as the first PC ATs. When the 80286 appeared, it became clear that the days of the VAX-11 were numbered and even inhibition with the introduction of systems based on 80286 could not change anything fundamentally. More straightforward Englishmen from Acorn, having made ARM in 1985, without hiding anything, said that ARM is much cheaper and much faster. VAX-11, however, remained relevant until the early 90s, while still having some advantages over the PC, in particular, faster systems for working with disks.
VAX-11 - this is probably the last mass system, in which the convenience of working in assembly language was considered more important than speed. In a sense, this approach has moved to modern popular scripting languages.
The photo shows the VAX-11/785 - this is also a computer (1984) - the fastest among the VAX-11, with processor speed comparable to IBM PC AT or ARM Evaluation System
Surprisingly, there is very little literature on VAX-11 systems in the public domain. As if there is some strange law of oblivion. The history of this architecture is associated with several episodes close to politics and correlated with the history of the USSR. It is quite possible that the actual rejection of the development of the PDP-11 architecture was caused by its cheapness and the success of its cloning in the Soviet Union. And the VAX-11 cloning was an order of magnitude more resources and led to a dead end. Interest in VAX-11 was created using, for example, using drawings such as the famous Kremlin Vaks on April 1, 1984, in which the then USSR leader Konstantin Chernenko offered to drink vodka on the occasion of connecting to the Usenet network. Another joke was that some VAX-11 chips were impressed with a message in broken Russian about how good the VAX-11 was. :)
Some models of the VAX-11 were cloned in the USSR by the end of the 80s, but very few such clones were produced and they found almost no use for themselves.
Multiple VAX-11 systems are available for use over the network. And this distinguishes them favorably from the IBM / 370 systems with which they competed.
Intel: 8086 to 80486
Undoubtedly, one of the best processors made in the 70s is 8086, as well as its cheaper almost analog 8088. The architecture of these processors is pleasantly distinguished by the absence of mechanical borrowing and following abstract theories, reasonableness and balance of architecture, balance and focus on further development. Among the shortcomings of the x86 architecture, we can call it some cumbersomeness and a tendency to extensively increase the number of instructions.
One of the brilliant constructive solutions of 8086 was the invention of segment registers. This seemed to achieve two goals simultaneously - “free” transportability of programs up to 64 KB in size (this was a very decent amount for computer memory for one program until the mid-80s), and addressability up to 1 MB of address space. You can also note that 8086, like the 8080 or z80, also has a special address space for 64KB I / O ports (y 8080 and 8085, this amount is 256 bytes). There are only four segment registers: for the code, for the stack, and two for the data. Thus, 64 * 4 = 256 KB of memory are available for quick use, but this was very much even in the mid-80s. In fact, there are no problems with code size, since it is possible to use the so-called long subroutine calls with loading and saving the full address from two registers. There is only a 64 KB limit on the size of a single subroutine - this is sufficient for many modern applications. Some problem is created by the impossibility of fast addressing to data arrays larger than 64 KB - when using such arrays, you need to load the segment register and the address itself every time you call, which reduces the speed of working with such large arrays several times.
Segment registers are implemented in such a way that their presence is almost imperceptible in machine codes, which allowed, when the time came, it was easy to refuse them.
The 8086 architecture retained its proximity to the 8080 architecture, which made it possible to transfer programs from 8080 (or even z80) to 8086 with comparatively small efforts, especially if the source code was available.
The 8086 teams do not have high execution speed, but they are quite comparable with competitors, for example, Motorola 68000, which appeared a year later. One of the novelties, slightly overclocking the generally unhurried 8086, was the queue of teams.
8086 uses eight 16-bit registers, some of which can be used as two byte registers, and some as index registers. Thus, the registers 8086 are distinguished by some heterogeneity, but it is well balanced and the registers are very convenient to use. This heterogeneity, by the way, allows for denser codes. The 8086 uses the same flags as the 8080, plus a few new ones. For example, a flag appeared typical of the PDP-11 architecture - step-by-step execution.
8086 allows you to use very interesting addressing modes, for example, an address can be made up of the sum of two registers and constant 16-bit mixing, which is superimposed on the value of one of the segment registers. From the sum making the address, it is possible to leave only two or even one item. Such a PDP-11 one team will not work. Most of the 8086 commands do not allow both memory operands, one of the operands must be a register. But there are string commands that just know how to work with memory using addresses. String commands allow you to do quick block copying (17 ticks per byte or word), search, fill, load and compare. In addition, string commands can be used when working with I / O ports. The 8086 idea is very interesting to use command prefixes,
8086 has one of the best among all computer systems organization of work with the stack. Using only two registers (BP and SP), 8086 allows you to solve all problems when organizing calls to subroutines with parameters.
Among the teams there are sign and unsigned multiplication and division. There are even unique decimal adjustment commands for multiplication and division commands. It is difficult to say that in the command system 8086 something is clearly not enough. Quite the contrary. Dividing a 32-bit divisible by a 16-bit divider with a 32-bit quotient and 16-bit remainder may require up to 300 clock cycles - not particularly fast, but several times faster than such a division on any 8-bit processors (except 6309) and is comparable in speed to 68000. The division by x86 has one unexpected feature - it changes the flags of signs unpredictably ...
It is worth adding that in the x86 architecture, the XCHG team inherited from the 8080 remains, which has been improved. In addition, later XADD, CMPXCHG, and CMPXCHG8B instructions were used in later processors, which can also perform atomically exchanging arguments. Such instructions are one of the features of x86, they are difficult to find on processors of other architectures.
It can be summarized that the 8086 is a very successful processor, combining both the convenience of programming and attachment to the memory limitations of its time. 8086 was used relatively rarely, giving way to the cheaper 8088 honorable place to be the first processor for the mainframe for personal computers of our time IBM PC architecture. 8088 used an 8-bit data bus, which made it somewhat slower, but allowed it to build on its basis more accessible to customers of the system.
Interestingly, Intel fundamentally refused to make improvements to its processors, preferring instead to develop their next generations. One of the largest subcontractors (second source) of Intel, the Japanese corporation NEC, which was much larger than Intel in the early 80s, decided to upgrade to 8088 and 8086, launching V20 and V30 processors compatible with them and up to 30% faster. NEC has even offered Intel to become a subcontractor! Intel instead launched a lawsuit against NEC, which, however, could not win. For some reason, this big disassembly between Intel and NEC is completely ignored by Wikipedia.
80186 and 80286 appeared in 1982. Thus, it can be assumed that Intel had two almost independent development teams. 80186 is improved by several commands and 8086 shortened timings, plus several chip-embedded circuits typical of the x86 architecture: clock generator, timers, DMA / PDP, interrupt controller, delay generator, etc. Such a processor would seem to greatly simplify production computers based on it, but due to the fact that the built-in interrupt controller was for some reason incompatible with the IBM PC, it was almost never used on a PC. The author knows only the BBC Master 512 system based on the BBC Micro computer that did not use the built-in circuits, even the timer, but there were several other systems using 80186. The addressed memory of 80186 remained in the same size as the 8086 1 MB.
80286 had even better timings than 80186, among which just fantastic division stands out (32/16 = 16,16) in 22 bars - since then they have not learned how to do the division faster! 80286 supports working with all new 80186 teams, plus many commands for working in a new, secure mode. 80286 was the first processor with integrated support for protected mode, which allowed organizing memory protection, proper use of privileged instructions, access to virtual memory. Although the work in the new mode created many problematic moments (the protected mode was made rather unsuccessfully) and was relatively rarely used, it was a big breakthrough. In this new mode, the segment registers have acquired a new quality, allowing you to use up to 16 MB of addressable memory and up to 1 GB of virtual memory per task. The big problem 80286 was the inability to switch from protected mode to real, in which most programs worked then. Using the “secret” undocumented LOADALL instruction, it was possible to use 16 MB in memory and in real mode.
In 80286, the calculation of addresses in the operands of instructions began to be made by separate circuits and stopped slowing down the execution of instructions. This added interesting features, such as the command
LEA AX,[BX+SI+4000]
In just 3 clocks, it became possible to perform two additions and transfer the result to the register AX!
Segment registers in protected mode have become part of a full-fledged memory management system (MMU). In real mode, these registers only partially provided MMU functionality.
The number of manufacturers and specific systems using 80286 is enormous, but, of course, the first were IBM PC AT computers with almost fantastic performance indicators among personal computers. With these computers, the memory began to lag behind the processor in speed, there were delays, but then it seemed to be something temporary.
In 80286, as in 8086/8088, work with interruptions was not implemented 100% correctly, which in very rare cases could lead to very unpleasant consequences. For example, the POPF command in 80286 always resolved interrupts in its execution, and when executing a command with two prefixes (for example, you can take the REP ES: MOVSB) on 8086/8088 after calling the interrupt one of the prefixes was lost. The POPF bug was only in the early 80286 releases.
Protected mode 80286 was extremely inconvenient, divided all memory into segments no larger than 64 KB and required difficult software support for working with virtual memory. 80386, having appeared in 1985, made the work in protected mode quite comfortable, allowed to use up to 4 GB of addressable memory and easily switch between modes. In addition, virtual 8086 mode was made to support multitasking for 8086 programs. For virtual memory, it became possible to use a relatively easy to manage page mode. 80386 with all its innovations retained full compatibility with the programs written for 80286. Among the innovations 80386, we can also call pulling registers up to 32-bit and adding two new segment registers. Timings have changed, but ambiguous. A quick bit shifter has been added, which allowed for multiple shifts with timings of one. However, this innovation is very much for some reason very slowed down the execution of commands cyclic shifts. Multiplication became slightly slower than that of 80286. Memory handling, on the contrary, became slightly faster, but this does not apply to string commands that remained faster with 80286. The author of this material had to come across the opinion that in real mode, with 16 The 80286 bit code is still slightly faster than 80386 at the same frequency.
In 80386, new teams were added, most of which only provided new ways to work with data, virtually duplicating with optimization for some cases available. For example, the following commands were added:
- to check, set, and reset a bit by number, similar to those made for the z80;
- BSF and BSR bitwise scan;
- copy values with a character or zero extension, MOVSX and MOVZX;
- setting values depending on the values of the SETxx operation flags;
- shifts of double values SHLD, SHRD.
Until the appearance of the 80386, x86 processors could use only short, with a shift of one byte conditional transitions - this was often very insufficient. With 80386, it became possible to use offsets from two (or four in the 32-bit address mode) bytes, and despite the fact that the code of new transitions became two (or three) times longer, its execution time remains the same as in previous ones, short transitions.
Radically improved debugging support by introducing 4 hardware breakpoints, using them, it became possible to stop programs even at addresses in memory that cannot be changed.
The main protected mode became much easier to manage than in 80286, which made a number of inherited commands unnecessary rudiments. In the generally protected, so-called flat-mode, segments of up to 4 GB are used, which makes all segment registers a subtle formality. And the semi-documented unreal mode allowed even using all the memory as in the flat-mode, but from a simple to install and control real mode.
Since 80386, Intel has refused to share its technology, becoming virtually the exclusive manufacturer of processors for the IBM PC architecture, and with the weakening position of Motorola, for other PC architectures. Systems based on 80386 were very expensive until the early 90s, when they finally became available to mass consumers at frequencies from 25 to 40 MHz. C 80386 IBM began to lose the position of the leading manufacturer of IBM PC compatible computers. This was manifested, in particular, in the fact that the first PC based on 80386 was in 1986 the computer of the company Compaq.
It is hard not to hold back the admiration for the amount of work that was done by the creators of 80386 and its results. I dare to even suggest that 80386 encompasses more achievements than all the technological advances of mankind before 1970, and maybe until 1980.
The topic of errors in 80386 is quite interesting. I will write about two. The first chips had some commands, which then disappeared from the manuals for these processors and ceased to be executed on later chips. If you use the first sources of information on 80386 practically, an unexpected difficulty may occur. These are the IBTS and XBTS teams. All 80386DX / SX, produced by both AMD and Intel (which reveals their curious internal identity), have a very strange and unpleasant bug, which manifested itself in destroying the value of the EAX register, if after writing to the stack or unloading all registers from there with the POPAD or PUSHAD used a command that used the address with the register BX. In some situations, the processor could even freeze. Just a horrible bug and very massive, and in Wikipedia there is not even a mention of it. There were other bugs.
The emergence of ARM has changed the situation in the world of computer technology. Despite the problems, ARM processors continued their development. Intel’s answer was 80486. In the struggle for speed and for the first place in the world of advanced technologies, Intel even went to the disfiguring image of a personal computer - the use of a cooling fan.
In 80486, the timings of most instructions were improved and some of them began to be executed as on ARM processors per clock. Although multiplication and division for some reason became a little slower. It is especially strange that single-time binary shifts and register rotations began to be performed even slower than with 8088! Appeared quite large for those years, the size of 8 KB, built-in cache memory. There were also new instructions, for example, CMPXCHG - it took the place of imperceptibly missing IBTS and XBTS instructions (it is curious that this instruction was already available as a secret in the late 80386). Very few new instructions - just six of which are worth mentioning a very useful command for changing the order of bytes in the 32-bit BSWAP word. The big useful novelty was the presence of an arithmetic coprocessor built into the chip - no one has done this yet.
The first systems based on the 80486 were incredibly expensive. It is rather unusual that the first computers based on 80486, model VX FT, were made by the English company Apricot - their price in 1989 was from 18 to 40 thousand dollars, and the weight of the system unit was more than 60 kg! IBM launched the first computer based on 80486 in 1990, it was a PS / 2 90 model worth $ 17,000.
It's hard to imagine Intel processors without secret, undocumented official features. Some of these features were hidden from users, starting with the very first 8086. For example, such a fact that almost no one needs is the fact that the second byte in the AAD and AAM decimal correction instructions matters and may be different, generally non-decimal (this was documented only from the processor Pentium 15 years later!). More unpleasant is the default of abbreviated AND / OR / XOR commands with an operand of a byte constant, for example, AND BX, 7 with an opcode of three bytes in length (83 E3 07). These commands, which make the code more compact, which was especially important with the first PCs, were quietly inserted into the documentation only on 80386. It is interesting that there is a hint about these commands in the corporate manuals for 8086 or 80286, but there are no specific opcodes for them. Unlike similar instructions ADD / ADC / SBB / SUB, for which complete information was provided. This, in particular, led to the fact that many assemblers (all?) Were not able to produce shorter codes. Another group of secrets can rather be called a certain oddity — a series of instructions each have two opcodes. We are talking, for example, about the instructions SAL / SHL (opcodes D0 E0, D0 F0 or D1 E0, D1 F0) and some others. Usually, and maybe always, only one opcode is used. The second, the secret is not used almost never. One wonders why Intel so carefully preserves these redundant instructions, cluttering up the space of opcodes? The SALC instruction has been waiting for its official documentation until 1995 for almost 20 years! Instructions for debugging ICEBP was officially non-existent 10 years from 1985 to 1995. Most of all, it was written about secret instructions LOADALL and LOADALLD - they will remain so secret forever, since they could be used for simple access to large amounts of memory only at 80286 and 80386, respectively. Until recently, intrigue around the instruction UD1 (0F B9), which unofficially was an example of an incorrect opcode, was preserved. The unofficial has recently become official.
In the USSR, the production of clones of the 8088 and 8086 processors was mastered, and the 80286 was never fully reproduced.
Motorola: from 68000 to 68040
Motorola is the only company that for some time has been able to successfully compete with Intel in the production of processors for personal computers.
68000 was released in 1979 and at first glance it looked much more impressive than 8086. It had 16 32-bit registers (more precisely, even 17), a separate command counter and a status register. Could address 16 MB of memory directly, which did not create any restrictions, for example, for large arrays. However, a careful analysis of the features of the 68000 shows that not everything is as good as it seems. In those years, having a memory of more than 1 MB is an unattainable luxury even for medium-sized organizations. 68000 code density is worse than 8086 - which means codes with the same functionality occupy 68000 more space. The latter is due to the fact that codes for 68k should be multiples of 2 bytes in length, and for x86 - 1. But information about the density of codes is controversial, since there is evidence that in some cases it can be better for 68000 than for 8086 . Of the 16 registers - 8 address, in some ways, these are slightly more advanced analogs of x86 segment registers. The ALU and data bus are 16-bit, so operations with 32-bit data are slower than might be expected. The execution time of register-register operations is 4 clocks, while the 8086 has only 2. Computers based on 68000 until the mid-80s were much more expensive than those based on Intel 8088, but 68000 could not work with virtual memory and did not have hardware support of working with real numbers, which made it unsuitable for use in the most advanced systems.
As always, with products from Motorola, the architecture of the 68000 reveals several clumsiness and contrived oddities. For example, two stacks or two carry flags (one for features and the other for operations). At this oddity with the flags do not end. For some reason, many teams, including even MOVE, disable the flags of the carry and overflow flag. Another oddity is that the command to save the state of arithmetic flags, which worked normally in 68000, was made privileged in all processors starting from 68010. Some operations are annoying because they are not optimized, for example, writing a zero to the CLR works slower than writing a constant 0 to memory with the MOVE command or left shift slower than adding the operand to itself. There are some almost superfluous commands, for example, there are both arithmetic and logical shifts to the left. Even address registers, while seemingly superior to segment registers 8086, have a number of annoying flaws. For example, they needed to load as many as 4 bytes instead of two in 8086 and of these four, one was superfluous. The 68000 command system reveals many similarities with the PDP-11 command system developed back in the 60s.
The codes for Motorola look somehow more cumbersome and awkward compared to x86 or ARM. On the other hand, 68000 is still faster than 8086, according to my estimates by about 20-30%. The 680x0 code, however, has its own peculiar beauty and elegance, less mechanicalism, characteristic of x86. In addition, as shown by communication with eab.abime.net experts , the code density of 68k is often better than that of x86.
Overall, 68000 is a good processor, with a large system of commands. It was used in many now legendary personal computers: in the first Apple Macintosh computers, which were produced before the early 90s, in the first Commodore Amiga multimedia computers, in relatively inexpensive and high-quality Atari ST computers. 68000 was also used in relatively inexpensive computers that work with Unix variants, in particular, in the rather popular Tandy 16B. Interestingly, IBM simultaneously with the development of the PC led the development of the System 9000 computer based on the 68000, which was released less than a year after the PC.
68010 appeared clearly late, only in 1982, at the same time, Intel released 80286, which put personal computers at the level of a mini-computer. 68010 is compatible with the 68000 connector, but the system of its commands is slightly different, so replacing 68000 with 68010 did not become popular. The incompatibility was caused by a contrived cause to bring 68,000 more into line with the ideal theory of virtualization support organization. 68010 only slightly, no more than 10% faster than 68000. Obviously, 68010 lost 80286 strongly and was even weaker than 80186 that appeared in the same year. Like 80186, 68010 practically did not find any use for personal computers.
68008 was also released in 1982, probably with the hope of repeating the success of the 8088. This is 68000, but with an 8-bit data bus, which made it possible to use it in cheaper systems. But 68008, like 68000, does not have a queue of commands, which makes it about 50% slower than 68000. Thus, 68008 may even be a little slower than 8088, which is only about 20% slower due to the presence of a queue of commands. than 8086.
Based on it, Sir Clive Sinclair made the Spectrum QL - a very interesting computer that, due to its lower price, could compete with Atari ST and similar computers. But Clive simultaneously and obviously prematurely began to invest a lot in the development of electric vehicles, leaving QL (Quantum leap - quantum leap) rather as a secondary task, which, with some unsuccessful design solutions, led the computer and the entire Clive company to prematurely close (the company became part of Amstrad, which refused to produce QL).
It would be interesting to calculate the index of digit capacity for 68000, it seems to me that it is clearly higher than 16, although rather not higher than 24.
Appearing in 1984, the 68020 again returned Motorola to the first positions. In this processor, many very interesting and promising new items were implemented. The strongest effect certainly produces an instruction pipeline, sometimes allowing you to perform up to three instructions at a time! The 32-bit address bus looked in those years somewhat premature, and therefore a cheaper version of the 68020EC processor with a 24-bit bus was produced. But the 32-bit data bus already looked quite appropriate and allowed to significantly speed up the work. The built-in cache, albeit small, 256 bytes, looked like a novelty, which made it possible to significantly improve performance, since the main dynamic memory could not keep up with the processor. Sufficiently fast operations were added for division (64/32 = 32.32) and multiplication (32 * 32 = 64), for about 80 and up to 45 cycles, respectively. The instruction timings were generally significantly improved, for example, division (32/16 = 16,16) was performed in approximately 45 cycles (more than 140 cycles in 68000). Some instructions in the most favorable cases can be executed without taking any bars at all! New addressing modes were added, in particular, with scaling - in x86, this mode appeared only in the next year in 80386. Other new addressing modes allow the use of dual indirect addressing, using several offsets, - the PDP-11 was noticeably surpassed here.
But some new instructions, for example, heavy operations with bit fields or that became less important with fast division and multiplication of new operations with 10 numbers, looked more like the fifth wheel of a cart than something essentially useful. Theoretically, dual address indirect addressing modes look interesting, but they are rarely needed and are executed very slowly. In contrast to 80286, 68020 takes time to calculate the address of the operand, the so-called effective address. The division in 68020 turned out to be almost two times slower than the miracle division in 80286. Multiplication and some operations are also slower. The 68020 does not have a built-in memory management system (MMU) and rather the exotic ability to connect up to eight co-processors could not fix this.
The 68020 was widely used in mainstream Apple Macintosh II, Macintosh LC and Commodore Amiga 1200 computers. It was also used in several models of systems for working with Unix.
The appearance of the 80386 with the built-in and very solidly made MMU and 32-bit tires and registers, again put Motorola in position number 2. 68030, having appeared in 1987, for the last time, for a short time was able to regain Motorola leadership. The 68030 has a built-in memory management system and a cache that has been doubled, divided into a cache for instructions and data - this was a very promising novelty. In addition, the 68030 could use a faster memory access interface, which can speed up memory operations by almost a third. But, despite all the innovations, the 68030 turned out to be somewhat slower than the 80386 at the same frequencies. However, the 68030 was available at frequencies up to 50 MHz, and 80386 only up to 40 MHz, which made the top systems based on the 68030 slightly faster.
68030 was used in computers of the Apple Macintosh II series, Commodore Amiga 3000, Atari TT, Atari Falcon and some others.
With the 68040, Motorola once again attempted to outperform Intel. This processor appeared a year later after 80486, but in terms of a combination of useful qualities, it could not surpass it. In fact, Motorola, having a more overloaded system of commands, was unable to support it and, in a sense, left the race. At 68040, only a very trimmed coprocessor for handling real numbers could be placed, and the chip itself was heated significantly more than 80486. According to lowendmac.com/benchmarks, 68040 was only about 2.1 times faster than 68030, which means that 68040 is slightly slower than 80486 on that same frequency. 68040 practically did not find use in popular computers. Some noticeable use was found only by its cheaper version - 68LC040, which does not have a built-in coprocessor. But,
Problems with math coprocessors at Motorola have always been. Motorola, as already mentioned, never released such a coprocessor for the 68000/68010, while Intel released its very successful 8087 since 1980. For the 68020/68030 processors, two 68881 co-processors and its improved pin-compatible version 68882 were made at once But to get a significant performance gain, the code for 68882 needs to be compiled differently than for 68881.
It is appropriate to say that Intel x86 still has problems with the mathematical coprocessor - the accuracy of calculations of some functions, for example, the sine on some arguments is very small, sometimes no more than 4 characters. Therefore, modern compilers often compute such functions without resorting to the services of a coprocessor.
National Semiconductor 32016
This is the first true 32-bit processor proposed for use in computers as early as 1982. This processor was originally planned as a VAX-11 on a chip, but because of the impossibility of negotiating with DEC, National Semiconductor (NS) had to make the processor only similar parts on the VAX-11 architecture.
The use of paged virtual memory begins with this processor - today it is the dominant technology. But virtual memory support is not built into the processor, but requires a coprocessor. A separate coprocessor is also required for working with real numbers.
The NS32016 command system is huge and similar to the VAX-11 command system, in particular, by having a separate stack for subroutine frames. The address bus is 24-bit, which allows up to 16 MB of memory. Feature 32016 is the work with flags flags. In addition to the standard carry flags (which can also be used as a sign for a conditional transition), overflow, sign, equality (or zero), there is also the L (less) flag, meaning less - this is like a transfer for comparisons. The situation with the transfer is similar to the one that is in processors Motorola 680x0. The overflow flag is for some reason called F. There are flags of step-by-step mode, privileged mode and (uniqueness!) Flag for selecting the current stack. When executing arithmetic instructions, flags of the sign, zero, and lesser (L) are not set, they are set only by the comparison commands.
Eight 32-bit general purpose registers can be used. In addition, there is also a command counter, two stack pointers, a stack pointer for subprogram frames, a program base pointer (this is something unique), a module base pointer (also something very rare), a pointer to the interrupt vector table, a configuration register, and a register states. In terms of speed, the NS32016 turned out to be comparable to the 68000.
32016 as far as I know it was used only with BBC Micro personal computers as a second processor. It was a very expensive and prestigious prefix for 1984. It was possible to order a processor with frequencies of 6, 8 and 10 MHz. There were some technical problems with the latter and it was very expensive. The software for 32016 was very small, only made by Acorn, the Unix-like operating system Panos, and Acorn's regular satellite BASIC. BBC Micro did not use the MMU chip - although it could be connected, there were no programs to use it. Arithmetic coprocessor even connect was not provided.
It is known that this very complex processor had serious hardware errors that have been fixed for years.
Acorn arm
The 6502 ideology, namely, making it easier, cheaper and better, found its continuation in the almost fantastic development of Acorn, the ARM-1 processor, released in 1985, at the same time as Intel’s technological miracle, the 80386 processor. ARM consisted of an order of magnitude smaller the number of transistors and therefore consumed significantly less energy and was at the same time on average much faster. Of course, ARM did not have any MMU and even divide and multiply operations, so in some calculations based on the division 80386 could be faster. However, the advantages of ARM were so great that today it is the most massive processor architecture. It was released more than 100 billion of these processors.
The development of ARM in 1983 began after Acorn conducted research with a 32016 processor, which showed that many 6502 calculations with half the operating frequency could be faster than this, as it seemed, a much more powerful processor. At that time, 80286 was already available, which showed very good performance, but Intel, perhaps sensing the rather small potential of the small Acorn company, refused to provide its processor for testing. At the same time, the technology 80286 was not closed as 80386 and was transferred to many firms, so the story is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps, if Intel allowed to use its processor, then Acorn would use it, and would not develop ARM.
ARM developed only a few people, and tested the command system using BBC Micro basic. The development itself took place in the building of the former utility room, which is often referred to as a barn or barn. The debut of the processor turned out rather unsuccessful. In 1986, a prefix for the BBC Micro was released with the name ARM Evaluation system, which contains 4 MB of memory in addition to the processor (this is a lot for those years), which made this console a very expensive product (its price was over £ 4,000, i.e. 6000 dollars). Of course, if you compare it with the computers of that time with comparable capabilities in speed, the prefix turned out to be an order of magnitude or even almost two cheaper. But for the new system there were very few programs. And this is somewhat strange since it was quite possible to port Unix for this system - then numerous Unix variants were available that did not require the MMU, there were Unix variants for the PDP-11, 68000, 80186 and even 8088. It is curious that in the 90s Acorn Archimedes was ported Linux. Perhaps the delay in the emergence of this Unix for ARM was caused by Acorn's reluctance to transfer ARM technology to other firms.
First ARM based system
Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn, in addition to ARM, also tried to conduct expensive development of computers for business, which failed, in particular, due to the shortcomings of the 32016 processor chosen for them. Acorn Communicator was also not very successful. Development of a relatively successful, but not quite IBM PC compatible Master 512 computer was very costly. In addition, a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286 based computers, was allowed to enter into as part of a hypothetical big game of absorbing Acorn itself. By the way, after absorbing Acorn,
As part of Olivetti, Acorn developed an improved ARM2 chip with built-in multiplication commands, on the basis of which Archimedes personal computer was stunning then with its speed, the first models of which became available in 1987. However, the management from Olivetti was focused on working with IBM PC compatible computers and I did not want to use my resources to sell Acorn products.
ARM provides for the use of 16 32-bit registers (there are actually more of them if we consider the registers for system needs). One of the registers, R15, like the PDP-11 architecture, is a command counter. Almost all operations are performed in 1 cycle. More clocks are needed, in particular, for transitions, multiplications and memory accesses. In comparison, with the main processors of those years, ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented, if necessary, through one of the registers. When calling subprograms, the stack is not used; instead, the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of ARM is the combination of the program counter, which is 26-bit, that is, it allows you to address up to 64 MB with a state register. Eight bits are allocated for flags in this combined register, two more bits in this register are obtained due to the fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words; it cannot directly access 16-bit data. Instructions for working with data from ARM 3-address. A characteristic feature of the RISC architecture is the use of register-memory commands only for loading and unloading data. ARM has a built-in fast bit shifter (Barrel Shifter), which allows you to shift the value of one of the registers in a command by any number of times without any clock cycle. For example, multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command
ADD R1, R0, R0 shl 6
, and multiplication by 63 - by the command
RSB R1, R0, R0 shl 6
In the command system there is a reverse subtraction, which allows, in particular, to have a unary minus as a special case of this command and speed up the division procedure. ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction. The instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures, such an execution takes place, as a rule, only for conditional transitions. This feature of ARM allows in many cases to avoid a slow transition operation. The latter also contributes to the fact that when performing arithmetic operations, you can refuse to set the status flags. With ARM, like the 6809 processor, you can use both fast and regular interrupts. Besides,
The ARM instruction system contains significantly fewer basic commands than the x86 processor instruction system. But the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for 80386, for example, RSB (reverse subtraction), BIC (AND with inversion, such a command is in PDP-11), 4-address MLA (multiplication with summation), LDM and STM ( loading or unloading multiple registers from memory is similar to the MOVEM command for 68k processors). Almost all ARM instructions are 3-address, and almost all instructions 80386 have no more than 2 operands. The ARM command system is more orthogonal - all registers are interchangeable, some exceptions are registers R14 and R15. Most ARM commands may require 3-4 80386 commands for their emulation, while most 80386 commands can be emulated by 2-3 ARM commands. Interesting, That the IBM PC XT emulator on the Acorn Archimedes computer hardware with an 8 MHz processor runs even faster than a real PC XT computer. On a Commodore Amiga computer with a 68000 7 MHz processor, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also interesting that the first NeXT computers with 25 MHz 68030 showed the performance of integer calculations at the level of the same 8 MHz ARM. Apple was going to make Apple's successor computer in the Möbius project [], but when it turned out that the prototype of this computer in emulation mode overtakes not only Apple] [but also based on 68k Macintosh processors, the project was closed! On a Commodore Amiga computer with a 68000 7 MHz processor, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also interesting that the first NeXT computers with 25 MHz 68030 showed the performance of integer calculations at the level of the same 8 MHz ARM. Apple was going to make Apple's successor computer in the Möbius project [], but when it turned out that the prototype of this computer in emulation mode overtakes not only Apple] [but also based on 68k Macintosh processors, the project was closed! On a Commodore Amiga computer with a 68000 7 MHz processor, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also interesting that the first NeXT computers with 25 MHz 68030 showed the performance of integer calculations at the level of the same 8 MHz ARM. Apple was going to make Apple's successor computer in the Möbius project [], but when it turned out that the prototype of this computer in emulation mode overtakes not only Apple] [but also based on 68k Macintosh processors, the project was closed!
Among the disadvantages of ARM, we can highlight the problem of loading a constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore, loading a full 32-bit constant can take up to 4 commands. You can, of course, load a constant from memory with a single command, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another disadvantage of ARM is its relatively low code density, which makes the programs somewhat large, and most importantly, reduces the efficiency of the processor cache. However, this is probably the result of the low quality of compilers for this platform. For a long time, a significant disadvantage of ARM was the lack of built-in support for memory management (MMU) - this support, for example, was required by Apple in the early 90s. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. ARM did not have such advanced debugging tools as x86 had. There is still some strangeness in the language of the standard assembler for ARM: it is customary to write the operations of the bit shifter separated by commas. Thus, instead of a simple form
R1 shl 7
- shift the contents of register R1 by 7 bits to the left - you need to write R1, shl 7
. Since 1989, ARM3 has become available with built-in cache. In 1990, the ARM development team separated from Acorn and created ARM Holding with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. Subsequently, Acorn ceased its independent existence, and ARM Holding turned into a large company. The separation of Acorn and ARM Holding was also initiated by Apple’s desire to have an ARM processor in its Newton computer and not depend on another computer manufacturer.
Further development of the ARM architecture is also very interesting, it affected, in particular, the interests of such well-known companies like Intel, DEC and Microsoft, but this is another story. Although it can be mentioned that it was thanks to the share in ARM Holding Apple in the 90s that it was able to avoid bankruptcy.
Some conclusions, assumptions and questions
It is hard to get rid of the feeling that 8-bit processors turned out to be only an undesirable necessity for the main characters in the 70s and 80s characters on the computer history scene. The most successful 8-bit 6502 was actually frozen. Intel and Motorola rather slowed down as their own development of small processors, and restrained other developers.
I’m pretty sure that the Amiga or Atari ST would work better and faster on a 4 MHz processor with a 20 or 24-bit address compatible with 6502 than with a 68000. Bill Mensch said recently that it’s easy to make 6502 at 10 GHz.
If in the Amstrad PCW series, the success of which the Commodore CBM II could have divided, began to use optimized z80 at higher frequencies, then it is quite possible that this series would be relevant 10 years ago.
What would the world be like if ARM did in 1982, which was quite possible?
What would domestic computers be like if they copied and developed not the most expensive, but the most promising technologies?