All people can't write code.

    On the eve of Moscow Python Conf ++, we talked with Nikita Sobolev, the CTO of We Make Services, about the global problem of managing the complexity of the code in terms of the development of programming languages. And also about why here over time the situation only gets worse. Plus, they asked why he needed to create his own linter.

    - Tell us in a few words about yourself and your work.

    I am the Technical Director of “We make services”. Voicing the name of the company, I usually ask the question: "What do you think we are doing?". In fact, we specialize in web development: frontend and backend for corporate clients. We also work according to our own methodology, which we are improving in parallel with the development of the company - the Repeatable Software Development Process (RSDP).

    - At Moscow Python Conf ++ you will be telling, among other things, your own linter. How does your job relate to auditing and managing code complexity?

    In general, we have two main areas: development directly and everything around it: consulting, drafting of requirements and, in particular, auditing, in the process of which I see a lot of someone else's code. The code is completely different: the one that is now in development, and legacy, which no one will ever fix; and the code that the customer’s specialists write, and the one they ordered on the side. And in all variants of the code there are a lot of problems: the same and different.

    - You will speak to the developers in Python. Does Python have any features in terms of managing the complexity of the code?

    Of course!

    Firstly, all languages ​​with dynamic typing are more affected by unjustified complexity, at least due to the lack of additional context when reading the code. And you are allowed more dirt.

    Secondly, Python is actively developing. It has new syntax elements, new concepts and modules into standard libraries that break everything that happened before.

    - How bad is everything in Python? There are also other actively developing languages, for example, JavaScript, which is often criticized for just this. Is the situation better in javascript?

    Not. I would even say that in terms of complexity, everything in Python is quite good relative to other languages. In JavaScript, everything is really bad for one simple reason: several entities that do not belong to the language itself are mixed in the JS project code — third-party plug-ins and libraries that are used to build the project. For example, if you use a Webpack, you can write an `import ()` function that loads modules asynchronously. It turns out that the collector shoves some of its insides into your programming language, and as a result, it is generally not clear what is happening.

    Managing complexity is difficult when the language changes from installing Babel or plug-ins to it. And to understand how they work, you need to follow the standards of the language, the specific implementation, etc.

    In Python, the situation is much better. Language develops quite systematically, and this development has clear milestones. It is impossible to drastically change the syntax in two lines in the config. And this is still a backend, to which we are accustomed to make higher demands than to the frontend. However, in my opinion, there are quite a few new changes in Python that break down what was previously, bringing dubious benefits.

    - So with the development of the language, things get worse?

    If we recall that AsyncIO appeared - essentially a second language inside Python - of course, the complexity has increased very much. In fact, there are now two completely independent programming languages ​​with a similar syntax: Python and Python + AsyncIO. That is, Python as an entity has become twice as difficult, because it has two separate descendants that work according to different rules.

    The opinion that these are different programming languages ​​is not popular. However, when you ask opponents of this opinion, for example, to start an asynchronous function from a synchronous code, they fail. Libraries are completely different too. Want to use a synchronous library to work with the database - please. Do you want asynchronous?

    But in Python, which was written five years ago, nothing has changed much, and vice versa, tools have appeared that allow you to simplify the code, for example, annotations and type checking.

    - Does complexity management affect the fact that there are quite a lot of people with a weak technical base in programming?

    Of course. For such people, even a special programming language was invented. Called Go. I am not kidding. Indeed, the goal of creating the Go language was to attempt to involve Google students and interns who are unable to learn C ++ to write code. Python didn't suit them in terms of performance, they needed something else, and they invented Go to Google. As it turned out, a lot of people are ready to write on it, because it is very simple. But at what price is this simplicity achieved? We are given not a normal programming language, but a very truncated version of it - there are almost no complicated concepts by design. There are no generics, no such thing as exceptions, etc. And fans of this approach a lot.

    But there are other developers, and for them the problem is that there are languages ​​in which there is no balance: you can do simple things simply, and complex ones you cannot do at all. Or at least through pain - you have to struggle with the tool in order to do something. Here, I think, lies the problem of complexity management.

    - What are the typical problems of someone else's code?

    Usually they are divided into two parts.

    The first is the problems associated with the fact that people cannot agree on where to put conditional commas. You read one code and see commas in one place, switch to another file - and see commas in another place. This complicates perception, as if reading a book printed in bold in one place and in italic in another. It distracts from the content, because the brain has to recognize that this is a different way of writing the same thing.

    When you correct the syntax, you start paying attention to semantics, because people write conceptually differently. Unfortunately, there is no possibility to reach an agreement at this level - it is impossible to come to an agreement that we are solving such tasks like this, but such ones are like that. It is impossible to initially cover all cases. This process occurs during the code review of the immediate task: when the developer is explained why his decision cannot be made. If the practice of code review is used and the reviewers are good, they cut off the solution curves and there are no problems in the code. But usually we come to an audit where this process is not established. And the problems of semantics and architecture are much more difficult to solve, because it is sometimes difficult to formulate and define them for themselves.

    - And how does it look in practice?

    For example, people can solve the same problem in templates, in views or in models. And there is no universally accepted understanding of exactly where this task should be solved: no documentation or patterns applicable specifically to this project (for example, here we use thick models and put all the logic in them, and here - thin; good or bad, now not important, but we agreed so).

    - Where do you see the main reason why these problems exist at all?

    All people can not write code.

    This thesis is interpreted as follows: the problem is that we are people. And it’s generally very difficult for us to write something structured and logical. And here we have two recipients of different types. Firstly, this is the person who will read this code, and secondly, this is the machine that should execute it. The code for the machine should be created in accordance with the criteria of performance, memory consumption and CPU time, and the code for a person should be based on the principles of readability, clarity, etc. These are two opposite tasks. And a person who, in fact, cannot fully solve even one of them, is forced to solve both conflicting tasks at the same time.

    - But after all, the use of different programming patterns is essentially an engineering search? Is it really bad?

    Of course, engineering search is important and necessary. But he, too, must be managed. Before each such task it is necessary to set clear criteria and limitations: on time spent, on business requirements, on engineering practices and tools.

    I observe creative search much more often. There are no such restrictions, validation of the results, too. Quality - as in modern art - is not measurable.

    Almost all the clients who turn to us for an audit suffer from a typical situation: someone did something to them, they hired a developer to somehow develop a solution, but he came and spread his hands: “I don’t know what here to do, let's rewrite everything. ” Would it be nice to rewrite? Will not be. When you decide to rewrite, you step on the exact same rake: you trust the task to another developer who makes other mistakes, but in the end everything turns out exactly the same.

    - Need some other approach?

    Yes. During the audit, we try to find the cause of the problems with the code: it’s not why someone took the module and blown it up to the point that it was scrolling with difficulty, and why the wrong decision was made initially. And we are trying to automate or simplify the adoption of the right decisions within the specified limits.

    I will give a little inside to the report. Everyone has an understanding that the code consists of lines - this is the simplest entity from which it can consist. Each line can be written as

    x = 1,

    and can be as

    x = Math.median(forecast_data) if forecast_data else compute_probability(default_model).

    There is a very big difference between these two lines, because you understand the first one easily, and the second one concentrates a lot of logic. It is necessary to execute it in the head in parallel with the interpreter. Therefore, you need to start managing how you write the code from managing a single line of code. Then the line turns into more complex concepts - functions, classes, modules, etc. But the rules that you accept should be one.

    As a result, we are not doing what prohibits doing many things. Because management is about imputed bans.

    - Did you come across any funny things in someone else's code?

    Of course. I even have a repository where I collect such code samples.

    The most terrible example that I saw showed me that within a loop, a function can be defined for a hundred iterations. To be honest, when I looked at it, the interpreter broke down inside me. I guessed, but did not know that it was possible.

    There was a case when we saw a lot of funny comments in the code. Someone complained about life, about work, there were also those who wrote: “I understand that I am writing nonsense, but the customer makes me”. However, customers usually do not force you to write bad code. They are asking to solve their problem, and what code do you write there, they’re generally up to the bulb.

    - Linter, code review - do not save?

    I have two answers. Yes, they do. No, do not save. It saves if you strictly follow the rules and regulations that the pro-liners give you (those who do a lot of rough work for you: check functions for complexity, code semantics, etc.). This item must be blocking. Sometimes you can't just run a linter to look at the result. If you failed these rules, then you should not release the code in production at all.

    But in fact - do not save. Because those projects that use it are rare.

    By the way, I am often asked: how to implement it? And I answer: very simply, you put a line in CI - check my code - and if it falls, that's all, you have implemented. It only remains to refactor. Fortunately, there are now autoformers and the ability to refactor code file by file. The next question is traditionally: how to explain to business that it is important?

    - Is there any general answer to this question?

    For each case, the answers are different, therefore, in general, it is difficult to formulate (you need to think about that, about this ...). But usually the companies that deal with this problem are coming from the technical side. Those. techies ask us, as people who are able to talk about business and technology, they understand it, explain it to business in their particular case. With this formulation of the problem it works very simply. When you come, everything is already bad, and everyone understands that. A conversation with business begins like this: “You probably think that your programmers are sitting and doing nothing?”. And business nods his head. And you say that is not the case. Programmers are great guys trying to solve your problems. But without an integrated approach to project management, everything falls into chaos, and this is normal.

    And we propose to come up with rules to avoid certain problems. We consider the cost of introducing different pieces, and then we estimate the real (accomplished) losses from the fact that there are no such pieces yet. For example, programmers ruled a bug for a month that does not exist or can be found in 30 seconds if you use a certain approach and tool. The numbers are well convincing.

    - In the end, this is an administrative problem?

    Of course. I am convinced that programmers want to write good code. But there are various obstacles. Someone does not know how due to inexperience. Someone has lost motivation, because everyone does not care. Someone does not know what exactly is good code for the reason, let's say, creative throwing. They put pressure on someone - he wants and can write, but he is told that it should be tomorrow. And instead of building partnerships with the business and explaining why this will not happen tomorrow (or if so, then it will be necessary to rule for another three days), he does it anyhow. And such partnerships are interesting for the business itself. He also needs to make it work for a long time and is cheap to maintain.

    That is, here all the questions are solved: there are no intractable contradictions.

    - There is a code style - PEP 8. It does not help to quickly understand what is good?

    In terms of commas - it helps. But what's the point if you put the commas correctly and everything else is bad?

    - Not enough of some well-known higher-level stuff?

    In theory, there are some best engineering practices. But they are either unknown or ignored. When you ask why the developer did not follow this practice, he says that he seems to have heard that this is a good topic, but the code works that way. When the code stops working, you ask if he understood where the relevant best practice came from and why to follow it? No, I do not understand. He thinks he was just wrong.

    It is quite difficult to explain to a person that it is normal to make mistakes. Everyone is wrong, we are all human. But the best engineering practice was invented to save you from a mistake or to protect you from consequences. Those. This is a kind of safety tool, as in enterprises. Only it was written not in blood, but in ruined time and money.

    In general, our unattainable global task is to automate code review so that Python itself (if we are talking about our case) knows how to write it. It should be a tool that provides not only opportunities, but also limitations for developers.

    - Why do you even develop a linter? Is it possible to use (or develop) existing ones?

    In fact, we do so. Our linter in fact is a plugin for Flake8. We simply position it as a full-fledged tool, and not just a plugin.

    Why Flake8, not Pylint? Pylint does a lot of what a linter shouldn't do. For example, it implements a very large number of type checks, although type checker should deal with types. Plus, it gives a very large number of errors, which actually are not. And I don’t like its documentation, and I’m afraid of its own implementation of ast. It is difficult to configure. By allowing configuration, you allow people to make the wrong choice. Therefore, we have a task - to make a tool that can not be configured. So that you put it - that's all.

    - What guides formed the basis of this linter? Or is it just your own experience?

    Now it is based on the rules that we have been formulating for code review for many years. Some rules ported from other linters: ESLint, Pylint, SonarQube, Credo. Much has been taken from the excellent work of CognitiveComplexity . Always looked at Miller's Wallet. Separate rules - this is my vision, which appeared after the evaluation of a large number of someone else's code. That is, at this stage, it is a “hodgepodge.”

    - What are you going to talk about at Moscow Python Conf ++?

    First of all - about complexity management. This topic is close and understandable to all developers. We will look at different metrics, on ways to transfer complexity from the simplest constituent code - the line to the most complex - module. And then I will talk about the holivar part, where I will set out my vision of how to or not need to write in Python, and ask the users to vote what they like and what they don’t. For many developers, restrictions (doing A, but not doing B) are an attempt on their creative space, so they react very strongly to this. And just here you can unleash an interesting discussion.

    - To whom the report is focused?

    I think that these are still well-established developers, because novice programmers have not yet formed a clear opinion. Although it will be interesting to them to listen and speak. They are definitely our users.

    Friends, we hasten to remind you that less than a month is left before our Moscow Python Conf ++ . This year there will be more than three dozen speeches and a series of meetings in it. The final program will be announced the other day, but for now you can see the general list of reports.

    Also popular now: