How we teach Software Engineering at St. Petersburg HSE

    In previous posts we told what our students do on internships: scientific (for example, in JetBrains Research ) and industrial . In this post we want to share how we teach industrial programming.



    Briefly: for four courses, a former schoolchild tries a dozen different technologies and languages, constantly writes and removes a lot of code, passes code review from more experienced comrades (not always on the first attempt), delves into some topic and eventually defends a content diploma. All this takes place right at the university and gives a diploma of the state unit. In the summer, you can either take a rest, or be staked in Russia at JetBrains, Yandex and JetBrains Research (if you want more science) or go abroad (Google, Facebook and others). Now in more detail.


    About myself


    My name is Egor Suvorov, I study at the HSE Master's Degree, I have been interning at Google (twice), Asana, GSA Capital, and successfully participated in international programming competitions ( student and school ). Last year he graduated from the Bachelor of Academic University, so he went through almost everything described in the post. I also participate in the development of a software engineering curriculum and conduct practical exercises in several subjects (Paradigms and Programming Languages, C ++).


    Main components


    In this post we will consider only the sub-direction “Industrial Programming”. Although we also have “Machine Learning”, “Programming Languages” and some others, the programs of the directions overlap, especially in the first two courses.


    The training consists of three main parts. First, I will give a general overview, and then I will tell you more about each part.


    1. Basic subjects. From the very first semester, children learn to work with their hands. Every six months, all “industrial programmers” must pass 2-4 basic subjects (and some subjects are obligatory for other subdirections). Purpose: to pump a student in more or less all areas of programming, so that you can walk through the levels of abstractions up and down. Starting from writing your toy OS with user space in C with a pinch of assembler (for the most persistent) and the command line, then through the move semantics in C ++ and up to the monad transformers. It is necessary to form the horizons and experience of different programming. We learn not only languages: parallel programming, networks, databases, too. And, of course, some items are not about code, but still in the case: for example, Software Engineering (overview course: why do we need teams / managers / project and risk management) and interface design (otherwise they will think that “mold rivet is just”). Mathematics and algorithms also exist, but this is a topic for a separate post.
    2. Semester practice. They need to pass necessarily, starting from the second semester. Objective: to give the student to try different things bigger than homework and understand what you like more. At the beginning of the semester, a fair of projects takes place, where potential scientific leaders talk about what you can do with them. A degree for a leader, by the way, is not obligatory - much more important is what a particular person and practice can give to a student. In one semester, you can make a desktop + mobile application on Qt and understand how to work on one project in general for the entire semester without a clear TK (with difficulty). In the next - try Android and feel that “to make a reliable client for the social network” - it turns out to be difficult, even if the functionality is severely limited. In the other, try to make some machine learning tool in Python and realize that you don’t want to deal with this topic at all. In the fourth - go to finish the Haskell compiler, be terrified and return to the diploma back to the sweet heart of C ++. Or vice versa. Depends on the student - this is the meaning of the practice. As a result, the student will either have a favorite direction (in which you can work and make a diploma), or experience in a bunch of different directions. Win-win anyway. By the way, if you don’t like any of the projects, you can create your own. But in this case, you must first interest any supervisor or find someone from the outside, and then convince us that something meaningful and protected can come out of the project. be terrified and return to the diploma in the sweet heart of C ++. Or vice versa. Depends on the student - this is the meaning of the practice. As a result, the student will either have a favorite direction (in which you can work and make a diploma), or experience in a bunch of different directions. Win-win anyway. By the way, if you don’t like any of the projects, you can create your own. But in this case, you must first interest any supervisor or find someone from the outside, and then convince us that something meaningful and protected can come out of the project. be terrified and return to the diploma in the sweet heart of C ++. Or vice versa. Depends on the student - this is the meaning of the practice. As a result, the student will either have a favorite direction (in which you can work and make a diploma), or experience in a bunch of different directions. Win-win anyway. By the way, if you don’t like any of the projects, you can create your own. But in this case, you must first interest any supervisor or find someone from the outside, and then convince us that something meaningful and protected can come out of the project. By the way, if you don’t like any of the projects, you can create your own. But in this case, you must first interest any supervisor or find someone from the outside, and then convince us that something meaningful and protected can come out of the project. By the way, if you don’t like any of the projects, you can create your own. But in this case, you must first interest any supervisor or find someone from the outside, and then convince us that something meaningful and protected can come out of the project.
    3. Items of choice . Appear in the third year. The reason: not everyone wants to dig into the Linux kernel, as well as not everyone wants to delve into the design of Scala Collections with tons of implicit'ov. And so you can choose which topics to go into. For example, if a student does not like the number of ways to shoot a knee in the pros, he can take a course on container virtualization in Linux, write on pure C and be happy. And vice versa: if “Intel 64 and IA-32 architectures software developer's manual volume 3” still has nightmares, you can go to the beautiful world of Scala with abstractions of unicorns. Each module (half semester) offers approximately 4 courses, of which two must be chosen.

    In addition, we want students to enjoy learning. At any time, you can chat with program managers and suggest improvements in any area. We collect feedback four times a year and - attention - take it into account and constantly improve the program. We do not specifically recruit a lot of students to have the opportunity to personally speak with everyone. In the second and third year there are now 30 students, in the fourth - 15.


    We are also constantly creating or looking for new courses at the request of students, looking for good teachers who understand the subject and are able to teach. So, in this module, an experimental course on reverse engineering from SPbCTF successfully appeared in the program in just a few weeks . And if it’s impossible to find a suitable course, by agreement we can go through something meaningful with the Computer Science Center , the ShAD or Coursera.


    Basic subjects


    Main programmer subjects: C ++, Unix-like systems, paradigms and programming languages, computer architecture, Java, operating systems, functional programming, databases, Software Design, Software Engineering, parallel programming, computer network technologies, interface design, mobile development.


    Together, these items cover almost all the tasks that can occur in the work. They also save against various “classic” errors like comparing floating-point numbers with ==, expectations of adequacy from undefined behavior, race conditions, talk about the existence of design patterns and non-programmer tasks when developing products.


    Of course, in the process of learning, students constantly “fill their hands” on laboratory and homework. Take something classic, for example, the implementation of the archiver on the Huffman algorithm. It’s not so difficult to make “to make something work”. But to make a good project architecture (at least spread input-output, bit-compression and the algorithm itself), use C ++ features correctly (three or five rules depending on the half year) and generally arrange the code in such a way that it is nice to read and not quite ashamed lay out in open-source - a separate art, which teachers teach on the course of C ++, constantly communicating with students and sorting out all the lines of code in detail. On the other items similarly. No course is limited to theoretical tests. On all subjects with code writing there will be a code review from an experienced programmer.


    C ++ . First year of study. We start with C, end with C ++ 14. We show RAII, Valgrind and AutoTests, we learn to write both libraries (my_vector with a guarantee of exceptions), and applications (the same archiver). Why: because C ++ is still actively used in industry, plus it has a lot to do with system programming (no garbage collection, you can show the layout of data in memory ...).


    Unix-like systems . First semester. The course of a young fighter to work with the command line and file system without C:\and D:\. An example of the test: a pair of images of a slightly broken installed Ubuntu are distributed, you need to fix it. Example of homework: go through the files in the folder on Bash and encourage someone useful with regular expressions.


    Paradigms and programming languages . First semester. A dozen topics (at least OOP, functional programming, SQL, multithreading), and students can try each topic on one or two homework assignments. Of course, quite superficially, but it still gives an understanding of how versatile programming is and how cool things can be assembled using different methods. Both OOP, and OP, and SQL will then be described in detail, but already in the first year the student knows about their existence and can, say, arrange asserts in the code or write a couple of simple unit tests if he wants.


    Computer architecture . First year. Registers, building something computer-like from logical gates, caches, processor pipelines, number representations, and other theory. Why: to tie in a single picture pieces of information about all sorts of low-level things.


    Java . Second year of study. Ends with streams and course projects for Android (for example, playing recursive tic-tac-toe with a bot and network mode). We show Maven, IDEA, JUnit, asynchronous work with the network. Why: there are a lot of things on the JVM now, there are tons of libraries, it is useful to know.


    Operating Systems . Second year. Hardcore No boring mess with the loaders - students are given multiboot in the stub, the transition to protected mode, but then you can write a memory allocator, threads, processes, file system, separation by protection rings and even ELF loading. Perhaps not as extensively as Tannenbaum's “Modern Operating Systems”, but you can clearly understand whether it’s interesting to dig at all or want to stay in isolated userspace. If interested - welcome to the special course on programming in the Linux kernel. By the way, a heavily lite version is available on Stepik - there is no OS writing, but there is a necessary theory and verification tasks. In order not to fly because of failure, it is enough to pass it.


    Functional programming . Two classes on the lambda calculus and then went to Haskell. We finish with monad transformers. Of course, everything is detailed: a monad is not a box, but simply a useful abstraction of such a pattern of writing a code. Intermediate projects, however, are more theoretical than practical - to write automatic type inference in lambda calculus. But now the continuation of the course is being prepared (as an additional magistracy subject), where both multi-threading and a web server are planned. Also readable at the Computer Science Center .


    Databases . We get acquainted with relational DBMS (by the example of PostgreSQL), SQL. We design the database for a certain subject area, and then arrange each other code review according to the guidance from the teacher. Then the teacher makes a code review review. Contests like "write such a query for this database". Tasks for simple query profiling (EXPLAIN). Again, echoes the course at the Computer Science Center .


    Software Engineering . Describes how a programmer’s work can be arranged, what others do in a company, why good managers are still useful and why managing projects is also difficult. What is the point of planning, why it does not always work, why not all bugs need to be fixed ... The goal is to have an understanding of what is needed for projects other than people. Of course, it is impossible to disclose everything in detail in one semester, but, for example, it is useful to know that, after the development of a project, there is still no less important support.


    Software Design . All sorts of ways to simulate reality and (UML-diagrams), methods of decomposition, design patterns. The examples at the end of the course are GFS, BigTable, CMake ... In practice, we learn not only to write code, but also to describe the architecture and apply templates where they are relevant.


    Parallel programming . We start with simple threads and mutexes, at the end we parse and write lock-free / wait-free algorithms, penetrate MESI, study higher-level technologies like the fork-join framework, OpenMP, OpenCL, Intel TBB.


    Computer networking technology . Lectures: a detailed overview of the main protocols from the TCP / IP stack: ICMP messages, a historical excursion into the RIP, all sorts of DNS entries, how FTP / HTTP / SMTP / DHCP works, what NAT is, and even a little about IPv6. Practices: write on the pros your cross-platform TCP client and server, first for a toy messenger, and then a UDP client for DNS.


    Interface design . The student does not write a single line of code, but he goes through all the design stages of a good user experience: he designs the project, conducts research (including a survey of real people), develops and tests usage scenarios, and at the very end you can draw the interface in Sketch or Figma . The goal is to understand that a good product needs not only code, but also a bunch of other preparatory work. Code review is not here, but all intermediate artifacts with the teacher are actively discussed. It seems to me impossible to pass homework from the first attempt (however, it is not required).


    Mobile development . In-depth development course for Android. We already write more on Kotlin than on Java, we use all sorts of Kotlin-specific things for Android. Compared to a Java-based project from the second year, the application is more complicated, we work more with external dependencies and libraries, think more about the interface and users (here the course has something in common with the design of interfaces).


    Testing . Basically, there is a lot of theory that gives names to all standard practitioners that students have probably already invented on other subjects: testing the flow of control or data flow, all-pairs testing ... There is some specific practice too - make a test plan for method, to find extreme cases in such and such an application, and to get rid of several scenarios in a web application with the help of Selenium so that it would not be boring to just make up cases.


    Software engineering of large data , it is also Big Data Software Engineering. Read with Computer Science Center . We connect databases, parallel programming, distributed systems and other fashionable words - this is in lectures. At the practices last year, students wrote their distributed phonebook from scratch. In the following launches, it seems right to shift the focus from a low level to tools actually used in the industry, such as Zookeeper, Cassandra, and other scary beasts. So far, the main difficulty is how to emulate the conditions of “big data” for students and evaluate their solutions: there is no need to raise the Zookeeper, if there is no clear demonstration that everything is very bad without it.


    Practices


    The second important part of training is practice. From the first course the student does some practically useful tasks under the guidance of an experienced colleague. For example, the next application to manage your calendar or notes. Or a new functionality in an existing application. Or studies the complexity of the computability of any family of formulas, if it is brought in the direction of Computer Science.


    In the first courses, we do not require novelty or practicality (after all, the goal is to give a little play), but the requirements for the quality of projects and protection for the diploma are increasing. In the last courses, in addition to the question “what has been done?”, It is important for students to tell why it was done and why. At the same time, “I want this particular company, in which my supervisor works,” is not in itself a response. But "there hard drives die every second, so this open-source is not suitable, this article is purely theoretical, but Google has a solution, but it is closed" - completely. Protecting an unnecessary exercise from the second year as a diploma will not work - curious developers with laptops and Google are ready to defend (and some pre-defenses). “No one has done this yet” is practically the most dangerous thing to say. By the way


    Here are some photos with typical protection. Photographer: Dima Drozdov.




    Practitioners allow you to learn how to work “in the long run” with large projects, sometimes partially written by other developers. It is not always possible to guess the theme of the project: for example, having tried a low-level development, a student may ore to do it in the future. This is the meaning of the practice: to understand what is pleasant and what is not, not at work, but in conditions with lower rates. Although the latter practice should develop into a meaningful undergraduate diploma. "Substantial" is when a diploma can at least write an article on Habr and not go into the minus. Or, if the work is very good, publish it in a scientific journal, speak at a conference, or at least collect pluses.


    Optional items


    The third, but also important part - additional items. The topics are specific, everyone does not need them, but interested students can taste them. In the senior courses of such subjects the majority: the base is there, it remains to broaden the horizons in an interesting direction to the student. Take all the items, unfortunately, physically do not have enough time. Sometimes the set of courses changes, here are the ones that were offered to me:


    Alternative languages ​​for JVM . A course of two modules: in one they talk about Kotlin, in the other - about Scala. For Kotlin, we parse both the Java interop, and the spelling of our DSL, and Korutin. The last optional homework is to add a debugger to the interpreter of a toy language (written in the previous housekeepers) with the help of Coroutin. As for Scala ... The language is big, but we have time to disassemble all kinds of implicit :)


    Programming in the Linux kernel . A kernel module is developed step by step that emulates a virtual storage device: mmap, buffers, concurrent access, non-blocking I / O. On the way, you can recall interruptions and preemptive multitasking from the course of operating systems and study the internal structures of Linux (for example, wait queue).


    Compilers . We write the micro language compiler on OCaml. Intermediate stack machine, compilation in x86 without any LLVM, integration with libc. The surprised exclamations of students “why do I fall only on the expression of the length of a hundred?” (Probably because there is a bug in the allocation of registers). By the way, a similar course is also in Computer Science Center .


    Computer graphics . A relatively low-level course: we study OpenGL, write our shaders for shadows and deferred rendering, compare color mixing with and without gamma correction.


    Build DBMS . The internal structure of the database. All kinds of connection algorithms, formal models, column DBMS. In practice, you can implement several block-processing algorithms in a toy DBMS on the pros (for example, a doubly pipelined hash join).


    Container virtualization . Detailed study of containers in Linux. Namespaces and cgroups, say - and the API, and how it works. Any auxiliary tools for the network. In the process of writing your own container like Docker, and this is not so easy - you need to properly restrict a lot of everything, set up the network, forward the necessary files to the container ... However, using high-level orchestration as an example, Kubernetes is also considered.


    What we want to improve


    Both we and our students are more satisfied with the resulting program (judging by the polls). However, you can do even better, not only by improving existing items, but also by adding new ones.


    For example, it is still unclear how to transfer certain aspects of “work experience” to a university. The same work with legacy code is useful? And then. Even books and specific techniques are there. But in order to make a good course out of this, it is necessary to combine several factors:


    1. Do not distract teachers from their main work for a long time, so that they constantly help students understand the big project. And if there is good documentation, then this is not the same as Legacy.
    2. Students should be interested. “Add a thousand lines of code to an unnecessary project” is not included here.
    3. The result should be predictable. “It seems to be an unsolvable task, sorry, we didn’t think” is bad news from the results of homework.

    Unfortunately, we have not yet figured out how to do this. The closest is in the direction of machine learning, where seminars are held every week, at which students give reports on some of the latest articles. Perhaps this experience can be transferred to industrial programming.


    Perhaps the only areas not currently covered are web development (including the front-end), complex automation systems (like 1C or SAP), and computer security (an experimental course started in early February 2019). Maybe we forgot something else, or you know how you can learn to program even better - we will be happy to discuss in the comments.


    Nevertheless, we believe that graduates who are ready to work are already leaving the bachelor’s program, which, if they need to be specially taught, can only be done by the company's internal systems. By the way, a separate topic, over which we are now thinking and trying to implement - what after such a dense set of courses to teach in the magistracy, but this is a topic for a separate post.


    Also popular now: