Node.js without node_modules

Last week, the developers of Yarn (a package manager for Javascript) announced a new feature - the Plug'n'Play installation. This feature allows you to run Node.js projects without using the node_modules folder, in which project dependencies are usually installed before launch. The description of the feature declares that node_modules will no longer be needed - the modules will be loaded from the common cache of the package manager.

At the same time, the NPM developers also announced their similar problem solution.

Let's take a closer look at these solutions and try to test them in real projects.

History of the problem

Initially, the modular system of NodeJS was completely based on the file system. Any call require()mappitsya on the file system. For the organization of third-party modules, the folder node_modules was invented, into which reusable modules and libraries should be downloaded and installed. Thus, each project received its own separate set of dependencies, wasting rationally the disk space.

Installing dependencies takes most of the build time in CI systems, so speeding up this step will have a positive effect on build time as a whole.

Simplified, installing modules consists of the following steps:

Calculates a specific version of the module from the allowed interval.
All modules of the required versions are downloaded from the repository and stored in the local cache.
Modules from the local cache are copied to the project's node_modules folder

If the first two steps are already sufficiently optimized and are performed quickly when you already have cached modules, the third step has remained almost unchanged compared to the first versions of node and npm.

The new approach proposes to get rid of the third step and replace the actual copying of files with the creation of a table that maps the requested modules onto their copies in the local cache.

Using symlinks

Instead of actually copying modules, you can add a symlink to their location in the cache. This approach is implemented in PNPM , another alternative package manager. The approach may well work, but with symlinks there are many problems associated with the dual location of the file, the search for adjacent modules, etc. In addition, creating symlinks is a file operation that I would like to avoid in the ideal way of working.

We try Yarn PNP

More information about this feature can be found in the official description . This paragraph contains a brief retelling of it.

The PNP version of Yarn is now in feature-branch yarn-pnp .

Clone the repository locally with the desired branch

git clone git@github.com:yarnpkg/yarn.git --branch yarn-pnp

Assembly instructions yarn is here , a set of steps is very trivial.

After the build is completed, we add an alias to the custom version of yarn and we can start working with it:

alias yarn-local="node $PWD/lib/cli/index.js"

Plug'n'play switched in two ways: either through a flag: yarn --pnpor an additional configuration to package.json: "installConfig": {"pnp": true}.

As an example, Yarn developers have already prepared a demo project . It has a Webpack, Babel and other tools typical of the modern frontend. Let's try to install its dependencies in different ways and get the following results:

Typical installation yarn: 19s
Installation via yarn --pnp: 3s

Before the measurement, one cold installation was carried out so that all the necessary modules were already in the cache.

Let's now figure out how this works. After the pnp installation, an additional file is created in the project root .pnp.jswhich contains the override of native logic in the Module class embedded in Node.js. By loading this file into our code, we give the function the require()opportunity to get modules from the global cache and not to look in node_modules. All built-in yarn-commands, like yarn startor yarn testby default, preload this file, so no changes in your code will be required if you have already used Yarn before.

In addition to mapping modules, pnp.js performs additional dependency validation. If you try to call require('test'), without the declared dependencies package.json, you get the following error: Error: You cannot require a package ("test") that is not declared in your dependencies. This improvement should improve the reliability and predictability of the code.

Among the shortcomings of the new approach, it is worth noting that additional integration is required for tools that worked directly with the node_modules directory without the built-in Node mechanisms. For example, for Webpack and other frontend collectors, additional plug-ins will be needed so that they can find the necessary files for the bundling.

In the demo project there are sketches of resolvers , for Eslint, Jest, Rollup and Webpack.

In my experiment, there are still problems with Typescript, which is strongly related to the presence of node_modules and there is no simple possibility to override the module search strategy.

There will also be problems with postintall scripts. Since the module remains in the cache, postinstall scripts that change its state (for example, download additional files) can damage the cache and break other projects that depend on it. Yarn developers recommend disabling script execution of the flag --ignore-scripts. They have already experimented with the inclusion of this flag by default for all projects inside Facebook and found no serious problems. In the long term, the abandonment of postinstall scripts seems like a good step in view of known security problems .

Try NPM tink

The NPM team also announced its alternative solution. Their new tool, tink, is supplied as a separate, NPM-independent module. At the input, tink accepts a file package-lock.jsonthat is automatically generated at startup npm install. Based on the lock-file, tink generates a file node_modules/.package-map.jsonin which the projection of local modules to their real location in the cache is stored.

Unlike Yarn, there is no hook file that can be preloaded into your project to patch the require. Instead, it is proposed to use the command tinkinstead node, to get the right environment. This approach is less ergonomic, because it will require modifications in your code to make it work. However, as a proof-of-concept will do.

I tried to compare the installation speed of the modules with the commands npm ciand tink, but the tink was even slower, so I will not give any results. Obviously, this project is much more raw than Yarn and is not optimized at all. Well, we will wait for new releases.

Conclusion

Rejection of the node_modules directory is a natural step, taking into account the experience of other languages where this approach was not originally. This will favorably affect the speed of assembly with CI-systems, where it is possible to save cache packages between builds. In addition, if you transfer the package cache and file .pnp.jsfrom one computer to another, you can reproduce the environment without even starting Yarn. This can be useful in container systems for assembling: mount the directory with the cache, put the .pnp.jsfile, and you can immediately run the tests.

The new approach looks unusual and breaks some established practices, based on the fact that all modules are always available in node_modules. But the .pnp.jsfile offers an API that allows you to abstract from the real position of the files and work with the virtual tree. In addition, in extreme cases, there is a command yarn unplug --persistthat will extract the module from the cache and place it locally in node_modules.

In any case, nothing has yet been finalized, even the pull-request in Yarn is not yet injected, we should expect changes. But it was interesting to me to try the alpha version of the feature in and test them on a couple of my personal projects and make sure that this approach really works, making the installation faster.

Links

Tags: