Node.js without node_modules
Last week, the developers of Yarn (a package manager for Javascript) announced a new feature - the Plug'n'Play installation. This feature allows you to run Node.js projects without using the node_modules folder, in which project dependencies are usually installed before launch. The description of the feature declares that node_modules will no longer be needed - the modules will be loaded from the common cache of the package manager.
At the same time, the NPM developers also announced their similar problem solution.
Let's take a closer look at these solutions and try to test them in real projects.
History of the problem
Initially, the modular system of NodeJS was completely based on the file system. Any call require()
mappitsya on the file system. For the organization of third-party modules, the folder node_modules was invented, into which reusable modules and libraries should be downloaded and installed. Thus, each project received its own separate set of dependencies, wasting rationally the disk space.
Installing dependencies takes most of the build time in CI systems, so speeding up this step will have a positive effect on build time as a whole.
Simplified, installing modules consists of the following steps:
- Calculates a specific version of the module from the allowed interval.
- All modules of the required versions are downloaded from the repository and stored in the local cache.
- Modules from the local cache are copied to the project's node_modules folder
If the first two steps are already sufficiently optimized and are performed quickly when you already have cached modules, the third step has remained almost unchanged compared to the first versions of node and npm.
The new approach proposes to get rid of the third step and replace the actual copying of files with the creation of a table that maps the requested modules onto their copies in the local cache.
Using symlinks
Instead of actually copying modules, you can add a symlink to their location in the cache. This approach is implemented in PNPM , another alternative package manager. The approach may well work, but with symlinks there are many problems associated with the dual location of the file, the search for adjacent modules, etc. In addition, creating symlinks is a file operation that I would like to avoid in the ideal way of working.
We try Yarn PNP
More information about this feature can be found in the official description . This paragraph contains a brief retelling of it.
The PNP version of Yarn is now in feature-branch yarn-pnp .
Clone the repository locally with the desired branch
git clone git@github.com:yarnpkg/yarn.git --branch yarn-pnp
Assembly instructions yarn is here , a set of steps is very trivial.
After the build is completed, we add an alias to the custom version of yarn and we can start working with it:
alias yarn-local="node $PWD/lib/cli/index.js"
Plug'n'play switched in two ways: either through a flag: yarn --pnp
or an additional configuration to package.json
: "installConfig": {"pnp": true}
.
As an example, Yarn developers have already prepared a demo project . It has a Webpack, Babel and other tools typical of the modern frontend. Let's try to install its dependencies in different ways and get the following results:
- Typical installation
yarn
: 19s - Installation via
yarn --pnp
: 3s
Before the measurement, one cold installation was carried out so that all the necessary modules were already in the cache.
Let's now figure out how this works. After the pnp installation, an additional file is created in the project root .pnp.js
which contains the override of native logic in the Module class embedded in Node.js. By loading this file into our code, we give the function the require()
opportunity to get modules from the global cache and not to look in node_modules
. All built-in yarn-commands, like yarn start
or yarn test
by default, preload this file, so no changes in your code will be required if you have already used Yarn before.
In addition to mapping modules, pnp.js performs additional dependency validation. If you try to call require('test')
, without the declared dependencies package.json
, you get the following error: Error: You cannot require a package ("test") that is not declared in your dependencies
. This improvement should improve the reliability and predictability of the code.
Among the shortcomings of the new approach, it is worth noting that additional integration is required for tools that worked directly with the node_modules directory without the built-in Node mechanisms. For example, for Webpack and other frontend collectors, additional plug-ins will be needed so that they can find the necessary files for the bundling.
In the demo project there are sketches of resolvers , for Eslint, Jest, Rollup and Webpack.
In my experiment, there are still problems with Typescript, which is strongly related to the presence of node_modules and there is no simple possibility to override the module search strategy.
There will also be problems with postintall scripts. Since the module remains in the cache, postinstall scripts that change its state (for example, download additional files) can damage the cache and break other projects that depend on it. Yarn developers recommend disabling script execution of the flag --ignore-scripts
. They have already experimented with the inclusion of this flag by default for all projects inside Facebook and found no serious problems. In the long term, the abandonment of postinstall scripts seems like a good step in view of known security problems .
Try NPM tink
The NPM team also announced its alternative solution. Their new tool, tink, is supplied as a separate, NPM-independent module. At the input, tink accepts a file package-lock.json
that is automatically generated at startup npm install
. Based on the lock-file, tink generates a file node_modules/.package-map.json
in which the projection of local modules to their real location in the cache is stored.
Unlike Yarn, there is no hook file that can be preloaded into your project to patch the require. Instead, it is proposed to use the command tink
instead node
, to get the right environment. This approach is less ergonomic, because it will require modifications in your code to make it work. However, as a proof-of-concept will do.
I tried to compare the installation speed of the modules with the commands npm ci
and tink
, but the tink was even slower, so I will not give any results. Obviously, this project is much more raw than Yarn and is not optimized at all. Well, we will wait for new releases.
Conclusion
Rejection of the node_modules directory is a natural step, taking into account the experience of other languages where this approach was not originally. This will favorably affect the speed of assembly with CI-systems, where it is possible to save cache packages between builds. In addition, if you transfer the package cache and file .pnp.js
from one computer to another, you can reproduce the environment without even starting Yarn. This can be useful in container systems for assembling: mount the directory with the cache, put the .pnp.js
file, and you can immediately run the tests.
The new approach looks unusual and breaks some established practices, based on the fact that all modules are always available in node_modules. But the .pnp.js
file offers an API that allows you to abstract from the real position of the files and work with the virtual tree. In addition, in extreme cases, there is a command yarn unplug --persist
that will extract the module from the cache and place it locally in node_modules
.
In any case, nothing has yet been finalized, even the pull-request in Yarn is not yet injected, we should expect changes. But it was interesting to me to try the alpha version of the feature in and test them on a couple of my personal projects and make sure that this approach really works, making the installation faster.