Do it yourself: MSc Computer Science at the level of top American universities from home


    I have long wanted to write an article about education in Computer Science, but my hands did not reach. Still, I decided to finally do it. So what will we talk about? It's about what a MSc Computer Science diploma is from top US universities (in all its details, including basic courses, books and projects) and how to comply with it.

    Why MSc? This is a kind of fork: on the one hand, after MSc, you are an engineer ready for life (yes, we are talking about engineering preparation, as it seems to me this is the most painful place in our education system), on the other, you can safely follow the PhD path. As you know, you can get into a PhD program without really knowing how to program - this is especially true for theoretical Computer Science. On the other hand, finding a programmer’s job is also not very difficult, and often does not require a powerful education. But when you reach the MSc level, you get the opportunity to understand all new ideas in Computer Science, as well as the opportunity to put them into practice. That is, on the one hand, it’s cool to understand some deep learning and do something new in it, and also take and write your own operating system (who did this?). And you are not caught in the framework of a narrow specialization (unless of course you continue to study). That is, you are now a universal soldier, ready for anything.

    I hope this article will be useful:
    1. Students who want to meet high standards of top US universities, or going to graduate school in Computer Science
    2. Professionals who want to close the “holes” and gaps
    3. Maybe someone from the teachers will take note for your courses.
    4. Students, graduate students of American universities - I would also like to receive feedback, especially regarding the latest trends in education

    What will be written here? A minimum of philosophy and general thoughts: a specific program of undergraduate and graduate courses, of course from the disciplines closest to me. All courses were personally felt on their own skin, and this is why I am writing. (I tried to sign up for all the interesting courses that were, but my main emphasis is system programming, databases and artificial intelligence. Hence, of course, some bias, but I'm trying to offer a more or less universal program).


    1. Basic training
    2. Undergraduate program
    3. Graduate program
    4. Ready to test yourself? Computer Science Comps.

    Basic training

    The first thing to do is go through math. The generally accepted theory in the Russian academic environment is that our mathematics is very very cool, and we are ahead of the rest. But the line between theoretical Computer Science and mathematics is thin, and further, everything that is included in Computer Science will not be called mathematics. Well, in Computer Science, our successes in recent years, alas ...

    In a nutshell, the essence is this - although there is never much mathematics, you should not go too far. We need to get a hybrid education - a mixture of a scientist and an engineer, and do it at the end time. Therefore, mathematics must be minimized. For so many interesting things are in Computer Science.

    - Analysis - confident knowledge of the basics will come down, that is, a multidimensional analysis must be understood, but it is not necessary to delve deeply into all the evidence
    - Linear algebra - you need to understand well, a very necessary thing everywhere. Moreover, it is desirable at a fairly advanced level (eigenvectors, singular decomposition, conjugate gradients)
    - Difura - you can safely neglect it, very rarely where you need them
    - Optimization is very useful, especially in machine learning it is just an iron requirement
    - Algebras, topologies, etc. - on the one hand, it is all terribly useful, but on the other - it seems to me that it is not worth studying mathematically, abstractly, without direct application - you can master it when you need it (for example, relational algebra or category theory for type systems) and the necessary properties and principles have already been studied in conjunction with practice
    - Logic, set theory - I believe that it is necessary to understand the basics. ZFC must be taken.
    — Теория вероятности, статистика — по минимуму из классической математики, лучше учить то, что нужно для Computer Science в контексте машинного обучения, а то рискуете закопаться в том, что не особо полезно
    — Теория игр — полезная штука, но поверхностных знаний хватит надолго
    — Функциональный анализ, вариационные методы — очень классно, но изучать только если припрет, например в машинном обучении
    — Численные методы — только если хотите ими потом заниматься

    Все остальное или не математика, а Computer Science, или не нужно (пока не понадобится для конкретного случая)

    Языки программирования

    A polemic topic, there are many choices, I offer such a minimal gentleman's set:

    - assembler - you need to know some assembler a little, because we have to make our own compiler. There are several options:
    • RISC is a good thing, but where to get it, just emulate it, is not very convenient. But if you set up the environment, then RISC
    • LLVM is a useful thing, but it makes a lot of things easier, not 100% iron.
    • x86 is terrible, but gcc -S is really nice to use right on your laptop.
    • JVM bytecode - did not try to generate, probably there is a convenient way. But, if you use JVM a lot later - it can be very useful.

    - C ++ / Java - unfortunately you will not get far from these in system programming. But to the maximum it is necessary to do without them.
    - Python - quickly pile something, test a hypothesis, cool libraries, especially in machine learning and mathematics + the main chips from functional programming
    - Scala is a practical functional language, you can go down and close enough to the hardware if you wish. A lot of system things are already written in Scala. It should be made the main workhorse.

    (SQL, Prolog - naturally, too, but these are small reeds)

    Let's get started?


    The course will be considered as one quarter - 3 months. Skip all introductory programming courses.

    1. Discrete mathematics (do not forget that this is no longer mathematics)
    Combinatorics, graph theory, discrete ter. Ver., recurrence relations, generator functions.

    In fact, this is usually an initial course, you can watch it online, for example, at MIT:

    All this can be skipped and mastered in other courses (as time does not press). And then it can become boring, but it is bad.

    2. Algorithms and data structures
    Starting from all sorts, hash tables, different trees, then algorithms on graphs (Djikstra, min cut / max flow).
    By conceptual knowledge: an assessment of complexity in O notation, greedy, dynamic programming.
    Additional topics: linear programming, string algorithms, random algorithms.
    The very first principles of complexity theory.


    Dynamic programming is omnipresent, trees need to be understood, the lower bound for sorting must be able to be deduced. A purely theoretical course, there are problems in the textbooks. In general, quite simply, it won’t take too much time.

    PS Algorithms are further on in each subject area, and we will return to the general course only in the graduate program.
    But you can immediately recommend a book on random algorithms (a colleague recently advised, while only scrolling through, but it seems to be very competent), it is graduate level, but you need to start diving early:

    3. Calculation theory
    Here I propagate Sipser , in general, an excellent book is a must read, and it is also suitable for a graduate program.

    This is a super-important course, it will lay the foundation for everything else. At Sipser, everything is very intuitive, logical, and connected. (Recently I flipped through Kolmogorov - it immediately makes it clear that it is better not to go without the Mehmat level. Sipster, on the contrary, has a minimum of requirements, a minimum of formalization, only the most necessary - and everything becomes very accessible).

    We start with the definition of “task” - with Sipser it is a language. That is a lot of lines. An algorithm is something that says yes / no to any line. This concept is all the work. Further, the hierarchy of languages: regular, context-independent, computable, enumerable, non-computable. Complexity is also presented very well - P, NP, NP-complete, NP-hard, co-NP + random classes.

    In addition to good theoretical training, we get excellent skills:
    We work with finite state machines and regulars.
    We work with grammars and with a machine with a stack
    . We program on the Thuring machine - this really needs to be done, it’s cool, it expands consciousness.
    You can program on the machine here, even there are breakpoints!
    Learning to prove which problems are unsolvable, and in the most convenient and fastest ways.
    We play with reductions - transferring one task to another also greatly expands consciousness.

    The course is purely theoretical, excellent tasks, those that can break with stars brain
    For example: let us prove that a Turing machine becomes a power state machine if it is forbidden to rewrite input characters on a tape.

    4. Mathematical logic and set theory.
    It is usually not included in the Computer Science program, but I think it is necessary to master it. I learned from this book, a very simple book, very pleasant:

    Everything, this is the purely theoretical preparation in undergraduate Computer Science, now - applied courses

    5. Compilers (2 quarters)

    Book: : _Principles, _Techniques, _and_Tools Much has

    already been learned from the theory of computing, here the emphasis is on practice. Our task is to make a real compiler from a serious language into assembler. For example, we were given , but something more interesting is possible.

    - Parsing: here it is necessary to take something reasonable, like JavaCC or ANTLR.
    - Translation into AST
    - Semantic analysis: light, although you can get confused with type systems
    - Code generation

    If you have the time and energy - add here the Intermediate Language and a little optimization, the easiest.

    As a result, we perfectly understand how the compiler works, how a function call is implemented, how to make objects, methods, arrays, and more.

    Note: we had to write everything in C ++, but in the modern world this is absolutely not necessary for educational purposes. On the other hand, if the compiler writes in Python or Scala (ANTLR can work with python, I don’t know what it is - if someone knows a good tool, please tell me. I tried to understand what Spark SQL uses, but did not dig it) - you can do much more interesting with the least loss.

    6. Architecture

    Well, here it is - to read, do the tasks. But if possible, it would be nice to design a mini-CPU.
    On something like:

    7. Advanced data structures
    I'm not sure what good books are here, but it would be nice to make my indexes on the disk:
    - B-tree
    - Linear hash
    - R-tree

    Everything is already tough here, C ++. But you can not bother, unless you want to build databases / search engines.

    8. Operating systems (2 quarters)
    Here I have this approach - read the book: , but only for the sake of general concepts. But to really master the OS in this way:

    Take Nachos: (or Nachos 5.0j) and write our modules:
    - Synchronization primitives
    - Stream library
    - Multi-processing
    - Mini-shell
    - Virtual memory
    - File system

    Note: this is certainly pretty hardcore, but worth it. It’s probably better to take Nachos 5.0j, debug virtual memory, which in the memory problems itself is not very pleasant.

    After such an exercise, there will be no mystery about operating systems.

    9. Databases (2 quarters)
    We read the book:
    We do the following project: we write the SQL engine on top of a simple repository, if time is short - we do it, like MySQL - we start AST. If there is more time, we translate into relational algebra and add a couple of optimization (some rule-based).

    Note: again, at one time I had to do everything in C ++ / lex / yacc. Time has gone ahead, if done on Python or Rock, you can do much more with less loss. Or take immediately more interesting query language, such as OQL or SQL ++.

    10. Artificial Intelligence
    AI - in my eyes, has always been and will be the most interesting area in Computer Science, since it always solves very complex problems. In this case, as soon as something starts to work out well, it ceases to be AI and stands out in a separate discipline. In general, we read a wonderful book, and do 2-3 projects.

    Recommended projects:
    - A * search for the 8-queen problem or some other tinker with heuristic
    - resolution theorem proving
    - do an effective conclusion for the Bayesian network
    is to implement Alain’s temporary logic
    - to self-learn to play checkers or play mini-max chess (counting a tree)

    Hard core: you can do all the tasks in the Probabilistic Graphical Models course, but this is more for the postgraduate level.

    11. Machine Learning
    Without this topic now in any way. I don’t know which tutorial is better for undergrad, but ideally learn Bishop:

    Here is the Andew Ng class at Stanford for undergrad, not to be confused with the class on Curser , completely different levels:

    And I also found just wonderful lectures on YouTube (mathematicalmonk):

    12. Computer Graphics
    Not really necessary, but every programmer once wanted to write your game, so you need to take for a fan and for talking over beer.
    Not sure what good books are right now.
    We wrote a piece of OpenGL in C, very useful - it gives an idea of ​​how all 3D engines work.
    You can also write your own Ray-Tracer - also a cool thing.

     13. Computer networks
    I missed this course, but there is a very good course on the Curser

    14. Distributed systems
    Strongly overlap with operating systems, but there are a lot of their important features.
    I would just read a book, especially key concepts, not specific systems:

    Synchronization, global state of the system, consensus, transactions, etc. Now all sorts of MPP systems have become very popular, here are the basics on which they are based (or not, I am preparing a popular article about all kinds of fashionable databases).

    15. Programming languages.
    Such a course often comes across, but usually a waste of time. At this point, a compiler has been written for Turing and SQL, everything is clear. You can be confused with something like Haskell or ML. As an option, learn XQuery for a little expansion of consciousness.

    Then I would finish the undergrad program, you have a B.Sc. diploma, congratulations.

    Or you can go in breadth: Security / Cryptography, Scientific Programming, drill in AI: Computational Lingustics for example. What do we have after such a program? You can safely go to work, but there are still gaps in the theoretical base. You can fill out everything yourself, or you can continue to study at M.Sc.


    Computer Science Graduate School is not our graduate school. Here you continue to study for a couple of years and pass a comprehensive exam, in the case of PhD in 5 disciplines, that is, you need to keep a very serious base in mind at this moment. A very useful exercise (I took the 3rd on MSc, this is much easier).

    And specialization begins very quickly. But let's get the basics:

    1. Algorithms
    Same as undergrad, only for real, deeper, plus randomization.
    Here, as it were, there is no universally recognized book, I advise you to climb on the websites of universities. That is, many cost articles, slides, etc.
    Well, random algorithms are our everything. So open the book:

    2. Computational theory, essentially pure Complexity Theory
    You can cover Sipser and try to read something else. We had Papadimitrow, but this is certainly not for the faint of heart. Here mainly reductions, used to be NP-complete, now there are many in random classes.

    Also, if you like logic, it makes sense to read Descriptive Complexity:

    3. Architecture
    The same book, but in an adult way. You can build some kind of advanced CPU.

    Further specialization by and large. I would recommend such areas, but this is my bias:

    4. Machine Learning
    At the postgraduate level, you can take Bishop (he still has not completed it all), and machine learning theory
    I like the good old one: Computational-Learning-Theory / dp / 0262111934
    But probably there are already much more relevant books.

    And you can hang on the cursor:
    - Probabilistic Graphical Models
    - Another very good Natural Language Processing course from Columbia University
    - Modern neural networks:

    5. Database theory .php? ebook = 7942
    This is a very exciting field, everything that is possible is mixed here: Logic, Model Theory, Complexity, Descriptive Complexity, even game theory have been exploited. The book is quite heavy and formal, but worth it, at least selectively read and do the exercises.

    What is missing?
    There is absolutely nothing about formal methods, well, this is a tricky thing. In theory, between set theory, artificial intelligence and database theory + descriptive complexity - there is all the tools for verification and evidence, so this course should already be purely applied. If you have experience with such a course, it would be very interesting to know.

    Internet mathematics - well, this is also a slightly separate topic, but the whole foundation is laid.

    Everything, if you get here, doing projects and solving problems, you can assume that you have reached the level of M.Sc. at Computer Science at the level of top universities in the world.

    Let's pass the test?

    You can search the Computer Science Comprehensive Exam or something like this, and you can find real M.Sc. exams. and Ph.D.

    I decided not to spread the links, so as not to pop off once again, since usually they are not very distributed.

    PS That's all. Something is probably out of date, but the basics always remain relevant.

    PPS Of course, sitting at home on the couch, it is difficult to master such a program. Doing exercises and big projects is especially difficult. Without a university environment, it is also very difficult (but it can also be difficult at a university - nights in the laboratory are a private phenomenon). But! - everything is possible. All of these books (well, almost all) are written as simple as possible, require a minimum of prior knowledge, immediately show how to apply the acquired knowledge, in general, are very suitable for independent study. Courses on a cursor or recording lectures from universities is also a very useful thing; to watch a lecture requires much less effort than reading a book, solving problems, etc.

    PPPS Tips for professionals who have “forgotten everything” are stuck in their niche, but want to go to the bleeding edge again: he himself went through it, there is a brain clouding and it seems that everything is lost and there is no hope. But in essence, this is how to start playing sports after several years of junk food - we start small, set short goals, enjoy the smallest progress, and quickly enough all the formulas become clear again, the general picture of the world is built again, and you can again feel like a student / graduate student.

    University sites:

    UC Berkeley:
    UC San Diego: (my alma mother, well, not yet # 4, but creeps up every year!)

    Also popular now: