Do I need to learn C to understand how a computer works?

Original author: Steve Klabnik
  • Transfer
I often heard that in order to understand how a computer works, people suggest studying C. Is this a good idea? Are you sure? Immediately present the conclusions of the article, just for absolute clarity:

  • C is not "how a computer works."
  • I do not think that most people speak literally, so it does not matter.
  • Understanding the context means that teaching C for this reason can still make sense, depending on your goals.

I plan to write two more articles with a more detailed explanation of the findings, but this is already enough. I will add links here when articles come out.

I often heard from people like this:

By studying C, you can understand how computers work.

I do not think that the idea is initially wrong, but it has some reservations. If you keep them in mind, it may well be a viable strategy for learning new and important things. However, I rarely see people discussing these reservations in detail, so I am writing this article to provide, in my opinion, a very necessary context ... If you are thinking about learning C to understand computer work, then the article is for you. I hope it helps to figure it out.

Before we really start, I would like to say one more thing: if you want to study C, then study! Learning is great. Studying C has become very important for my understanding of computing and my career. Studying this language and its place in the history of a programming language will make you a better programmer. You need no excuse. Learn things just for the sake of learning. This article is intended to become a guideline in order to understand the truth, it does not discuss whether or not to study C.

First of all, to whom this idea is generally recommended. If you are trying to “learn how computers work,” then it goes without saying that you currently do not understand this. What programmers do not understand how computers work? I basically saw this feeling come from people who mostly program in dynamically typed “scripting” languages, such as Ruby, Python, or JavaScript. They allegedly “do not know how computers work”, because these languages ​​work inside a virtual machine, where only the semantics of the virtual machine matters. In the end, the whole idea of ​​a virtual machine is to provide portability. The goal is not to depend on the hardware on which the VM runs.

There is only one problem: C also works inside the virtual machine.

Abstract machine C


From the C99 specification , section 5.1.2.3, “Program execution”:

Semantic descriptions in this International Standard describe the behavior of an abstract machine, in which optimization questions have no meaning.

In my opinion, this is most important to understand when studying C. Language does not "describe how a computer works," but describes how "abstract machine C" works. All the rest is important from this concept.

Another note: here I chose C99, which is not the latest C standard. Why? Well, MSVC has ... interesting support for the C language , and today I am a Windows user. Yes, you can run clangand gccunder Windows. There is not much difference between C89, C99 and C11 in terms of what we are talking about. At some point you have to choose. The version I mentioned here includes some edits to the original specification.

Perhaps, in conversations about C, you heard another phrase: "C is a portable assembler." If you think about this phrase, then you will understand that if this is true, then C cannot correspond to the operation of a computer: there are many different computers with different architecture. If C is like an assembler that runs on different computers with different architectures, then it cannot simultaneously function exactly as each of these computers. It must hide the details, otherwise it will not be portable!

Nevertheless, I think that this fact does not matter, because it is unlikely that people literally mean "C is how a computer works." Before returning to this, let's talk about the abstract machine C, and why many people do not seem to understand this aspect of C.

Retreat: why are people mistaken?


I can only talk about my experience, although for sure it is not unique.

I learned GW-BASIC, then C, then C ++, then Java. I heard about Java before I started writing about it in 1999, four years after it appeared. Marketing was at that time actively opposing Java and C ++, it focused on the JVM as a platform, and on the fact that the machine model distinguishes it from C ++, and therefore C. Sun Microsystems no longer exists, but the mirror of the press release reminds us:

Java applications are platform independent; you just need to port a Java virtual machine to each platform. It acts as an interpreter between the user's computer and the Java application. An application written in the Java environment can run anywhere, eliminating the need to migrate applications to multiple platforms.

The main motto was "Write once, run everywhere." These two sentences became how I (and many others) came to understand Java, and how it differs from C ++. Java has an interpreter, a Java virtual machine. In C ++, there is no virtual machine.

With such powerful marketing, a “virtual machine” in the minds of many people has become synonymous with “a large runtime environment and / or interpreter.” Languages ​​without this feature were too tied to a specific computer and required porting, since they are not truly platform independent. The main reason for the existence of Java was the modification of this C ++ flaw.

"Runtime", "virtual machine" and "abstract machine" are different words for the same fundamental concept. But since then they have received different connotations due to the insignificant dispersion in the implementations of these ideas.

I personally believe that this marketing of 1995 is the reason why programmers still misunderstand the nature of C.

So this statement is false? Why does Sun Microsystems spend millions and millions of dollars to promote lies? If C is also based on an abstract machine that offers inter-platform portability, why use Java? I think this is the key to understanding what people really mean when they say “C is how a computer works.”

What do people really mean?


Although C works in the context of a virtual machine, it still differs significantly from Java-like languages. Sun did not lie. To understand, you need to know the history of C.

In 1969, Bell Labs wrote a computer operating system in assembly language. In 1970, it was christened UNIX. Over time, Bell Labs bought more and more new computers, including the PDP-11.

When it came time to port Unix to the PDP-11, they decided to use a higher level language, which was quite a radical idea at the time. Imagine that today I’ll tell you: “I’m going to write an OS in Java” - you will probably laugh, although the idea is realizable.. The situation (in my understanding, I did not live then) was about the same. A language called B was considered, but it did not support some of the functions that the PDP-11 had, and so they created a successor, calling it "C", since this was the next letter in the alphabet.

The language "A" was not; B succeeded BCPL (Basic Combined Programming Language).

In 1972, the first C compiler was written on the PDP-11 and simultaneously UNIX was rewritten to C. Initially, portability was not thought of, but C became famous, so the C compilers ported to other systems.

In 1978, the first edition of the book "C programming language". Affectionately called “K & R”, by the names of its authors, the book was not at all like the specification, but it described the language in some detail, as a result of which others also tried to write compilers C. Later this “version” will be called “K & R C”.

As UNIX and C spread, they both ported to many computers. In the 70s and 80s, their hardware base grew steadily. Just as C was created, because B did not support all the functions of the PDP-11, many compilers used language extensions. Since there was only K & R, and not a specification, it was considered acceptable as long as the extensions were close enough. By 1983, the lack of any standardization was causing problems, so ANSI created a group to prepare the specification. In 1989, the standard came out C89, sometimes called "ANSI C".

Specification C attempted to unify these diverse implementations on different hardware. Thus, the abstract machine C is a kind of minimally possible specification that would allow the same code to work the same on all platforms. C implementations were compiled, not interpreted, so there was no interpreter, so there was no "VM" in the 1995 sense. However, C programs are written on this abstract non-existent computer, and then the code is converted to an assembler specific to the specific computer on which the program is running. You could not rely on some specific details to write portable code in C. This makes writing portable C very difficult, since you may have made a platform-specific assumption when writing the initial version of your code.

This is best illustrated by example. One of the main data types in the C language is char, from the word "character". However, the abstract C machine does not determine how many bits should be in char. Well, it does, but not by number; it determines the size CHAR_BIT, which is a constant. Section 5.2.4.2.1 of the specification:

The values ​​given below must be replaced by constant expressions, suitable or used in preprocessing directives #if... Values ​​in concrete implementations must be equal to or greater in magnitude (absolute value) of those given here with the same sign.

CHAR_BIT: 8

In other words, you know that it charis at least 8 bits, but the implementations may be more. To properly encode the "abstract machine C», as the size in the processing charshould be used CHAR_BITinstead 8. But this is not some kind of interpreter function, as we think of virtual machines; this is a property of how the compiler translates the source code into machine code.

Yes, there are systems where CHAR_BITnot 8.

Thus, this “abstract machine”, although technically the same idea as the Java virtual machine, is rather a compilation construct for managing compilers when creating assembler code, rather than some kind of test in runtime or property. The equivalent type in Java is that it byteis always 8 bits, and the JVM implementation is tasked with what to do on platforms with more bytes. (Not sure if the JVM works on any of these platforms, but this is how it should work). Abstract machine C is created as a minimal wrapper for various “hardware”, and not as some kind of solid-tissue platform, written in software for your code.

So, although Sun was not technically right, in practice they mean not so much what they literally say, and that theymean - right. The same with the phrase "Learn C to understand how computers work."

Learn C to BETTER understand how computers work


What do people really mean? In the context of “should a ruberist study C, in order to understand how computers work” is the advice to decline “to the level of iron”. That is, not only to understand how its program works inside the virtual machine, but also how the combination of the program and the VM work in the context of the machine itself.

Studying C will provide you with more such details, because the abstract machine is much closer to the hardware, as well as the abstractions of the operating systems. The C language is very different from high-level languages, so learning it can teach a lot.

But it is important to remember that C is essentially an abstraction.hardware, and abstractions are not perfect. Be careful with what C does or how it works with the machine itself. If you go too deeply, then you will definitely encounter these differences, which can cause problems. Most training resources for C, especially today, when equipment is becoming more homogeneous, will promote the idea that this is how a computer works. Therefore, it may be difficult for a student to understand what is going on under the hood and what is the abstraction provided by C.

In this discussion, we have not even raised other issues. For example, that because of the enormous popularity of C, the hardware has become more homogeneous, because it tends to move towards the semantics of the abstract machine C. If your architecture is too different from the semantics of C, C programs can run much slower than others, and the speed of the hardware is often measured by tests in the C language. This article is already quite long ...

For this reason, I think that a more accurate version of this statement will be "By studying C, you will learn more about how computers work." I really think that a rough acquaintance with C is useful to many programmers, even if they do not write C. Getting to know C will also give you an idea of ​​the history of our industry.

There are other ways to explore this topic; C is inherently not intended for studying computer, but it is a good option.

There is so much to learn in programming. I wish you success on this path.

Also popular now: