Life before runtime. Yandex Report

    In a large project, the task of identifying changes for the end user by differences in the application front-end code may arise. The developer from Yandex.Market Nikita Sidorov @nickshevr told how we solved this problem using the Diffector library, about building and analyzing the module graph in Node.js applications and about finding defects in the code before it was launched. - Today I will try to be as frank with you as possible. I have been working in Yandex.Market for a little more than a year and a half. I’m doing the same amount of web, and I began to notice changes in myself, you can also notice them. My average hair length increased and the beard began to appear. And you know, today I looked at my colleagues: at Sergei Berezhnoy veged , at Vova Grinenko tadatuta

    , and I realized - this is a good criterion for the fact that I have almost matured as a real front-end developer.

    And coming to this swearing, I decided to talk with you about life, about the one in which we all participate. Mostly about life before runtime. Now I will explain what all this will be about.

    What about life? About the life of the code, of course. Code is what we do. Let me remind you, I decided to be sincere with you here, so the first slide was as simple as possible. I took the truth, the first stage - the adoption, you see, no one will argue with this axiom.

    And then I realized that I would have to modify it, but so that it was clear. Let it be some kind of acceptance of requirements. Any code begins with the fact that you look at the task and try to accept the requirements that set you.

    After that, of course, we begin the writing stage - we write our code. Then we cover it with tests, we check its effectiveness ourselves. After that, we already check whether our application works with our code as a whole. After that, we give it to the tester - let him check. What do you think after that? I remind you, life before runtime. Do you think the runtime follows this? In fact, it turns out like this. And this is not a mistake in the presentation. Very often at any stage of checks - and there may be many more than I indicated - you may have some goto call to write again. Agree, this can be a pretty big problem. This can slow down the delivery of some features in production and, in principle, slow you down as a developer, because the ticket will hang on you. And here it all passes, passes. There are some more M times for N checks, and only then the code gets to the user in the browser. But this is our goal. Our goal is to write code that really will be available to the user and will really work for his benefit.

    Today we’ll talk about the first part. About what happens before, but not really about the tests.

    By the way, it looks something like this. I took our turn in the tracker, collected my own, counted the median. It turns out that my tickets are much less in development than in checking. And as you know, the longer it is in check, the higher the chance that either goto will appear at the beginning or goto will appear at the end - and I don’t want to do this at all.

    And also, if you pay attention, there are two words on the slide here - development (this is what we, developers are doing) and verification (this is what we are doing, but also testers). Therefore, the problem is relevant, in fact, for testers.

    The goal is pretty prosaic. In general, I like to say that life needs to be simplified: we already work a lot with you. The goal looks something like this, but you must admit, it is rather ephemeral, so let's highlight some basic criteria on which the goal may depend.

    Of course, the less code, the easier it is for us. The faster we have CI checks, the sooner we will realize whether we are right or not. That is, locally, it can generally start forever. Verification speed - this applies directly to the tester. If our application is large and needs to be checked in its entirety, this is a huge amount of time. Release speed depends on all of this. Including it is impossible to release until we have passed all the checks, and until we understand that the code is exactly the one we want.

    To solve some of the problems we are talking about, let's analyze the dependency graph of modules in our programming language. And, actually, let's describe it.

    The graph is oriented: it has edges with directions. In the nodes of the graph, we will have just the modules of the language we are talking about. Ribs are a specific type of bond. There are several types of communication.

    Let's look at a commonplace example. There is file A. Here, something from file B is imported into it, and this is such a relationship between nodes.

    The same will happen if you replace import with require. In fact, everything is not so simple here.

    I suggest, since we are talking about the type of dependency, consider two types at least - to speed up your pipeline, to speed up the graph traversal. It is necessary to watch not only the dependent module, but also the dependent one. I propose to call module A - the parent, B - the child, and I advise you to keep the links always as a double linked list. This will simplify your life, I inform you in advance.

    Once we have somehow described the graph, let's agree on how we will build it.

    There are two ways. Either your favorite tool in your favorite programming language using the same AST (abstract syntax trees) or regulars. What is the profit here? The fact that here you are not tied to anyone, but at the same time you have to implement everything yourself. You will have to describe all types of connections of all those things and technologies that you use, whether it be a separate CSS collector, something else like that. But you have complete freedom of flight, so to speak.

    In addition, the second option, I will also be promoting it a little, this is an option just for most people who already have a build system configured. The fact is that the assembly system collects a graph depending on by design, by default.

    Let's look at one of the most popular assembly systems in Yandex, this is webpack. Here I gave an example of how you can collect the entire result of webpack into a separate file, which then can be fed to our or some other analyzer. He collects it with the help of AST, the acorn library is used. You may have noticed her when something has fallen. I have noticed.

    And what are the advantages. The fact is that when you described your build system, you absolutely honestly asked entry. These are the files from which your dependencies are unwound, the initial bypass points. This is good, because you do not have to record them again. In addition, webpack and babel, and all this, and acorn, including, it is still not your maintain. And therefore, all sorts of new features of the language, all sorts of bugs and everything else, are corrected faster than if you did it, especially if you do not have a very large team. Yes, even if it’s big, then it’s not as big as open source.

    This is both a plus and a minus, in fact. It’s like such a double edge (double-edged sword) is obtained. The fact is that this graph is built during assembly. It’s kind of good, that is, we can assemble the project and immediately reuse the assembly result. But what if we don’t want to assemble a project, but just want to get this graph?

    And such a major minus, in fact. If you have any custom things connected, we’ll talk much later about connections, then the build system will not let you do this. Or, you will have to integrate this, such as your webpack plugin.

    Consider a specific example. I ran a command on my projection, where there are only three files, and got this output. And this I only show one key, which is called modules. We are just talking with you about the dependency graph of modules, so we look at modules, everything is logical.

    Quite a lot of information, but we do not need everything. Leave some points and let's talk them over. Suppose we consider the first module. He has a name, there are reasons. The reasons are just the connection, in fact, with “dependent” modules, it turns out those who import this module to themselves. This is the basic data to build a graph on it.

    In addition, please pay attention to usedExports and providedExports. We will talk about them a little later. But these are also very important things.

    And if you describe your decision, then you need to talk about the types of connections that happen between modules. That is, we have, of course, our system of modules inside our language: whether it be cjs-modules, or esm-modules. In addition, you must agree that we may have a connection between files in the file system at the level of the file system itself. These are some kind of frameworks: some kind of framework is going to, depending on how daddies are.

    And such a banal example - if you wrote the server side of Node, then quite often you could see such a popular npm package as Config. It allows you to quite conveniently define your configurations.

    To use it, you need to get the config folder, where you have NODE_PATH, and specify several JavaScript files - just to present config there for different environments. As an example, I created a daddy, specified default, development and production.

    And, in fact, the whole config works something like this. That is, when you write require ('config'), it just reads the module inside itself and takes the module name from the environment variable. As you understand, it was not clear there that these files are somehow used, because there is no direct import / require, webpack would not even recognize it.

    Link from the slide

    Today we also talked about Dependency Injection. I was not something that was inspired, but in support I looked at one of the libraries here. It is called inversify JS. As you can see, it provides a rather custom syntax: lazyInject, nameProvider, and here it is. And, you must admit, it’s not clear what kind of provider it is, what kind of module it really injects here. And we need it, and we have to understand it. That is, again, the build system will not be solved, and we will have to do it ourselves.

    Suppose we have built a graph, and I suggest you start by storing it somewhere. What will allow us to do this? This will allow us to do some kind of heuristic analysis, play a little bit of Data Science, and do it, focusing on a time slice.

    What is the idea? Here, indeed, is directly our data. We recently just implemented our design system in Yandex.Market and, in particular, implemented a component library as part of this design system. And here you can see: we consider the number of imports, the react component from our library, the common component. And you can distribute in directories. In this case, we have such a non-mono repository, and therefore we have platform.desktop, platform.touch and src.

    What can we think when we see these numbers? We can hypothesize that the touch command does not seem to increase the use of common components. This means either the components are bad for mobile - poorly made, or the touch command is lazy. But is this really so?

    If we look in a longer period, in a longer cut of time, this allows us to do just the storage of graphs after each release, then we will understand that, in fact, everything is ok for touch, the indicator is growing in it. For src, it’s even better, for desktop, it turns out not.

    There was still a question from the audience how to explain importance to managers. Here is the total number of library imports, also by time. Which managers do not like graphics? You can build such a schedule and see that the use of the library is growing, which means that this is at least a useful thing.

    One of my favorite parts. I will cover it fairly briefly. This is a search for defects in the graph. Today I wanted to talk with you about two types of defects: this is a cyclic dependency of modules and some unused module, that is, a dead code elimination problem.

    Let's start with circular dependency.

    Everything seems to be quite simple here. You already have a directed graph, you just need to find a loop there. I will explain why I am talking about this. The fact is that before I wrote, basically, the server side on Node.js, and we did not use, in principle, any webpack / babel, nothing. That is, they launched as is. And there was require. Who remembers how import differs from require? All is correct. If you wrote the code poorly, but I really did, you can find out on your server that your module is in some kind of cyclical dependence only when some request comes from users, or some other event will work. That is a rather global problem. Until runtime not understand. That is, import is much better, there will be no such problem.

    Then just take any algorithm you like. Here I took a fairly simple algorithm. We need to find a vertex that has only one type of edges - either inbound or outbound. If there is such a vertex, we delete it, remove the edges, and, in fact, continue this process, we will find and prove that there was a five-cycle cycle in this graph.

    Agree, if you looked at it by code, that is, there you can still find a cycle of two or three lengths, but it’s more impossible, and in our project there really was a cycle of seven, but not in production.

    About unused modules. There is also a rather trivial algorithm. We need to highlight the connected components in our graph, and just look, find those components, which do not include any of the entry nodes. In this case, this is this component of connectedness, both vertices, it turns out, both nodes. Then called entry.js. Actually, no matter what it is called, this is what you have described in the entry assembly config means.

    But there is another approach. If you haven’t collected the graph, and you just have a build system, then how is it the cheapest way to do it? Let's just mark all the files that got into the assembly during assembly. Tag them and create many. After that, we should get a lot of all the files that you have in the project, and simply subtract them. That is a very simple operation.

    And now I’m not just saying something theoretical to you, I was inspired, came to my project, and did this. And attention! I did not even delete node_modules. This I left as a growth point for the next review. And, in short, I was so inspired by myself that I decided to somehow make this slide, re-arrange it. Let it look like this, because it’s really cool!

    Good numbers, can you imagine how all became well? And then I was driven into such a steppe that I felt like a designer, and thought that this is an achievement that I would like to add to the frame. And, as you know, I got up, looked and realized that I was more likely not a designer, but, indeed, a web developer. But I'm not a fool. I took this frame, added to my site for SEO amulets.

    You can use, even the link is. And so that you do not think that I am deceiving you - we are frank today - I really looked at the reviews. I think you can believe them.

    Well, to be honest, it looked something like this. I saw a new hypo library thanos-js, took it, created a pool request. In secret, I have administrator rights in our repository. And I took and confused the master. How do you like that? Well, you and I are frank, and, in fact, it all looked like this. If anyone does not know, thanos-js is a library that simply removes 50% of your code randomly.

    Actually, I used the library there anyway, but the library is called differently. It’s called a diffector, and now we’ll talk about it with you. And here I would like to note that the pool request is quite significant, minus 44 thousand lines of code, and you can imagine - it passed the test the first time. That is, what I'm talking about can really work.

    Diffector In fact, he is engaged not only in the task of removing unused modules, searching for defects in the graph, but also in a more important task. What I initially declared was to help the developer and tester, now we will talk about it. And it works something like this.

    We get a list of modified files using the version control system. We have already built a graph - diffector builds it. And for each such modified file, we look for the path to entry and mark the modified entry. And entry will correspond with application pages that the user will see. But this is pretty logical.

    And what does this give us? For testing - we know which pages in the application have changed. We can tell the tester that only them is worth testing. We can also tell our ci-job, which runs autotests, that only these pages are worth testing. And for developers, everything is much simpler, because now testers do not write to you and do not ask: "Why do you need to test?"

    Let's look at an example of how diffector works. Here we have a certain directory, pages.desktop / *. It just contains a list of the pages themselves. And the pages are also described by several files. The controller is the server side of the page. View is some kind of react part. And deps, this is from another build system. We have not only webpack, but also ENB.

    And I made some changes to the project, to an empty file, the structure of which you saw. This is what diffector gives me. I just started it, diffector is a command line application. I launched it, he tells me that I have changed one page, which is called BindBonusPage.

    I can also run it in verbose mode, see a more detailed report, and really see that it at least works in such a simple case. As we see, in our BindBonusPage the index file and controller have changed.

    But let's see what happens if we change something else.

    I changed something else. And the diffector told me that I have changed nine pages. And this doesn’t make me happy any more, as if he would not really help me.

    Let's see why? It now shows the reasons why this page was considered modified. And as we see, the same thing here. This is some kind of text component from uikit.

    And let's see the diff. I took it and just changed the comment in the types. But you must admit, in this case the diffector did not work correctly. Actually, it’s not necessary to run tests and add them to the regression on all these nine pages, which were only due to the text.

    And this is really a problem. If we have files that are used a lot, according to the project, any change in the file will trigger a change in all your entry, which means that all your application pages will fall into test-scope, and then the efficiency is simply zero. That's how it should be solved.

    Trishashing. I hope that the majority is familiar with this term, and, in general, what it consists of.

    First of all, I’ll say about the issue again. We have some kind of file, it is often used. As an example - i18n, which is made on the knee, where the keys are simply stored. By changing it, you, in fact, changed the entire application in terms of communication in the graph. But in fact, you changed only the places where some key is used.

    And how to make it with us? If earlier our module was one file in the file system, now we have a module, these are, indeed, the exports that are in our file.

    That is, somehow. If it used to be like this, now we divide file B into exports, and it turns out that export-2 does not even have any edge. That is, we do not even need it. This is how trichashing works in your build system, if esm.

    But here, not everything is so rosy.

    Consider this code.

    Imagine if we change the value here, then in fact we will change not only this constant. We will also change the static class method, as well as the class method that called the static class method. That is, we have the second part of the task, where we need to resolve the dependencies inside the file itself.

    Yes, you can try to do this with the help of AST, but I will tell you that specifically for such a case there will be about 250 lines of code, and not the fact that this will all work. And, in fact, dependencies can be inverted there in a different way, and everything like that, so this is a rather difficult task.

    Also imagine that you have some kind of GlobalContext and some kind of function that changes it. How is it possible to understand here that when you modify the modify function, what has changed, really? In this module, or somehow when changing GlobalContext. But it is so hard. Therefore, we did not implement full-fledged trichashing. By the way, such things are called side effects. Maybe, when setting up webpack, you saw such a flag, you can specify it there. That is, if you put the module in webpack sideEffects: true, then you just will not be in a rage. But this will be guaranteed.

    To solve this problem, we decided to still slightly kick our developers and introduced an interactive mode. I will explain what and why. Instead of just running the diffector, it can now be run as a full console application. Here, as we see, I launched it interactively. And here he showed me the changes - all that were made in my code. And the pages that he touched.

    As we see, I have an agenda here, that is, diff, expand, log, and besides this, we see here the change that I have already told you, the change in the text.

    Here it is, I recall. We can watch it with the D button, see diff. And already the developer himself can understand that the text, for example, does not need to be included. Changes to the text component, in fact, will not affect the content for the user. Therefore, we can safely turn it off from this list of changes, reducing the number of pages that we need to test. That is, we took and removed eight pages.

    Here, in fact, there may also be several approaches. You can consider text changes for each of these eight pages. But this is a huge job. Imagine eight - in fact, not quite an honest figure. There may be more such pages. The text is used a lot. And forcing the developer to mark out each of these uses, and whether it affects it or not, this is, in fact, a return to the problem that I declared at the beginning of the report. Therefore, we decided to do so. Only the second line will get into the changes themselves, because we left it after our expert analysis.

    And the salt itself - what will we use the diffector for? Not only in order for the developer to play in consoles, but in order to make life better.

    Here we have some kind of entity, an abstraction over the page of the application to which the user has access.

    We discussed just such a relationship, discussed how the application page is related to entry. One application page can have several entries. And this link is looking for a diffector.

    Another need to consider another connection. There are test cases for the page. This connection is in the case storage, it can be obtained from there.

    Thus, according to the entry changes, we can get those test cases that we touched.

    It works like this. We run the diffector. He has several types of reporting. Besides the fact that he informs us that some pages have changed, he also generates a request in our store of test cases. Or in yours as you want. Here he says: look, BindBonusPage has changed, give me all the test cases that are associated with this page. This is for manual regression. And for autotests, actually, we slightly change the request. It looks something like this.

    Now the most important thing is to port everything to CI. This is just a screenshot of the comment from the ticket. Here the same thing: a certain request is sent down, which the tester will see when he comes to your ticket and starts checking it.

    In addition, it used to be like this. 43 minutes is only testing, and we have it commits every commit to the Market repository.

    And so it was. Not always, of course, but this is one of the good indicators.

    In general, I would like to say that in an ideal world such a task should not especially arise. The point is: you must organize your bundles in such a way that only those files that the user can actually get to get there. I understand that this is a very difficult task, and it is hardly feasible at all, but if that were the case, you could just get the list of modules that are there from the bundles. If these modules change, this is practically a guarantee that something will change for the user.

    I summarize. Indeed, everything that we talked about has practical benefits for large teams. If you have an application developed by two or three people, they can easily monitor other people's pool requests and how the code changes, understand how the application is connected. But in large projects, it is difficult for one developer to do this.

    In addition, if you really want to use static analysis, the dependency graph of modules, it is better to start by using the output of your build system. To get started, just understand if you need it or not. If you collect - store. If you store - analyze and improve the life of your developers. Thanks!

    Also popular now: