How to make it simple and straightforward to launch Java processes in Linux / Docker
In terms of the job profile of a DevOps engineer, I often do automate the installation and configuration of various IT systems in various environments: from containers to the cloud. I had to work with many systems based on Java-stack: from small (like Tomcat) to large-scale (Hadoop, Cassandra, etc.).
Moreover, almost every such system, even the simplest one, for some reason had a complex and unique launch system. At a minimum, these were multi-line shell scripts, like in Tomcat , or even whole frameworks, like in Hadoop . My current “patient” in this series, which inspired me to write this article, is the repository of Nexus OSS 3 artifacts , the startup script of which takes ~ 400 lines of code.
The opacity, redundancy, and complexity of startup scripts create problems even when manually installing one component on a local system. Now imagine that a set of such components and services need to be packaged in a Docker container, simultaneously writing another layer of abstraction for more or less adequate orchestration, deploying to the Kubernetes cluster and implementing this process as a CI / CD pipeline ...
In short, let's use the example of the mentioned Nexus 3 to figure out how to get back from the labyrinth of shell scripts to something more similar java -jar <program.jar>
, given the availability of convenient modern DevOps tools.
Where does this complexity come from?
If in a nutshell, then in ancient times, when, when referring to UNIX, they didn’t ask again: “in the sense of Linux?”, There was no Systemd and Docker, etc., to manage the processes, we used portable shell scripts (init scripts) and PID- files. Init scripts set the necessary environment settings, which were different in different UNIX systems, and, depending on the arguments, started the process or restarted / stopped it using the ID from the PID file. The approach is simple and straightforward, but these scripts stopped working at every nonstandard situation, requiring manual intervention, did not allow running several copies of the process ... but not the essence.
So, if you look closely at the above-mentioned startup scripts in Java projects, you can see the obvious signs of this prehistoric approach, including even references to SunOS, HP-UX and other UNIX systems. As a rule, such scripts do something like the following:
- use the POSIX shell syntax with all its crutches for UNIX / Linux portability
- determine the version and release of the OS through
uname
,/etc/*release
etc. - looking for JRE / JDK in the secluded corners of the file system and choose the most "suitable" version by tricky rules, sometimes also specific to each OS
- calculate JVM numeric parameters, for example, memory size (
-Xms
,-Xmx
), number of GC threads, etc. - optimize the JVM through the
-XX
parameters, taking into account the specifics of the selected version of the JRE / JDK - find their components, libraries, paths to them in the surrounding directories, configuration files, etc.
- customize the environment: ulimits, environment variables, etc.
- generate CLASSPATH type loop:
for f in $path/*.jar; do CLASSPATH="${CLASSPATH}:$f"; done
- command line arguments:
start|stop|restart|reload|status|...
- collect the Java command that you need to execute from the above
- and finally execute this java command . Often, when this is explicitly or implicitly uses all the same notorious PID-files
&
,nohup
special TCP-ports and other tricks of the last century (see. Example of karaf )
The mentioned Nexus 3 startup script is a suitable example of such a script.
In fact, all the above scripting logic, as it were, tries to replace the system administrator, who would install and configure everything manually for a specific system from beginning to end. But in general, any requirements of a wide variety of systems can not be taken into account. Therefore, it turns out, on the contrary, a headache, both for developers who need to support these scripts, and for system engineers, who then need to understand these scripts. From my point of view, it is much easier for the system engineer once to figure out the parameters of the JVM and configure it as it should, than every time when installing a new system to understand the intricacies of its startup scripts.
What to do?
Y - forgive! KISS and YAGNI in our hands. Moreover, it is the year 2018, which means that:
- with very few exceptions, UNIX == Linux
- the task of managing processes is solved both for a separate server ( Systemd , Docker ), and for clusters ( Kubernetes , etc.)
- a bunch of convenient configuration management tools appeared ( Ansible , etc.)
- total automation has come to the administration and has already been thoroughly entrenched: instead of manually setting up fragile unique “snowflake servers”, you can now automatically assemble unified reproducible virtual machines and containers using a variety of convenient tools, including the Ansible and Docker mentioned above.
- tools for collecting runtime statistics are used everywhere, both for the JVM itself ( example ) and for the Java application ( example )
- and, most importantly, specialists appeared: system and DevOps engineers who know how to use the technologies listed above and understand how to correctly install the JVM on a specific system and subsequently fine-tune it with the collected runtime statistics
So let's go through the startup-scripts functionality again, taking into account the listed points, without trying to do the work for the system engineer, and remove all the "extra" from there.
POSIX shell syntax⇒/bin/bash
OS version detection⇒ UNIX == Linux, if there are OS-specific parameters, you can describe them in the documentationJRE / JDK search⇒ we have the only version, and this is OpenJDK (well, or Oracle JDK, if you really need it),java
and the company is in the standard system pathcalculation of numerical parameters JVM, tuning JVM⇒ This can be described in the application scaling documentation.search for your components and libraries⇒ describe the structure of the application and how to configure it in the documentationenvironment setting⇒ describe in the documentation requirements and featuresCLASSPATH generation⇒-cp path/to/my/jars/*
or even, in general, Uber-JARparsing command line arguments⇒ there will be no arguments, since the process manager will take care of everything except the launch- build java commands
- java command execution
As a result, we just need to build and execute a Java view command java <opts> -jar <program.jar>
using the selected process manager (Systemd, Docker, etc.). All parameters and options ( <opts>
) are left to the discretion of the system engineer who adjusts them to a specific environment. If the list of options is <opts>
quite long, you can return to the idea of a startup script, but, in this case, as compact and declarative as possible . containing no programming logic.
Example
As an example, let's see how to simplify the Nexus 3 startup script .
The easiest option is to not get into the jungle of this script - just run it in real conditions ( ./nexus start
) and look at the result. For example, you can find a complete list of the arguments of the running application in the process table (through ps -ef
), or run the script in debug mode ( bash -x ./nexus start
) to watch the entire process of its execution and at the very end the launch command.
/usr/java/jdk1.8.0_171-amd64/bin/java -server -Dinstall4j.jvmDir=/usr/java/jdk1.8.0_171-amd64 -Dexe4j.moduleName=/home/nexus/nexus-3.12.1-01/bin/nexus -XX:+UnlockDiagnosticVMOptions -Dinstall4j.launcherId=245 -Dinstall4j.swt=false -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Xms1200M -Xmx1200M -XX:MaxDirectMemorySize=2G -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass -XX:+LogVMOutput -XX:LogFile=../sonatype-work/nexus3/log/jvm.log -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Dkaraf.home=. -Dkaraf.base=. -Dkaraf.etc=etc/karaf -Djava.util.logging.config.file=etc/karaf/java.util.logging.properties -Dkaraf.data=../sonatype-work/nexus3 -Djava.io.tmpdir=../sonatype-work/nexus3/tmp -Dkaraf.startLocalConsole=false -Di4j.vpt=true -classpath /home/nexus/nexus-3.12.1-01/.install4j/i4jruntime.jar:/home/nexus/nexus-3.12.1-01/lib/boot/nexus-main.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.apache.karaf.main-4.0.9.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.osgi.core-6.0.0.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.apache.karaf.diagnostic.boot-4.0.9.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.apache.karaf.jaas.boot-4.0.9.jar com.install4j.runtime.launcher.UnixLauncher start 9d17dc87 '''' org.sonatype.nexus.karaf.NexusMain
First, apply a couple of simple tricks to it:
- change
/the/long/and/winding/road/to/my/java
tojava
, because it is in the system path - put the list of Java parameters into a separate array , sort it and remove duplicates
JAVA_OPTS = (
'-server''-Dexe4j.moduleName=/home/nexus/nexus-3.12.1-01/bin/nexus''-Di4j.vpt=true''-Di4jv=0''-Dinstall4j.jvmDir=/usr/java/jdk1.8.0_171-amd64''-Dinstall4j.launcherId=245''-Dinstall4j.swt=false''-Djava.io.tmpdir=../sonatype-work/nexus3/tmp''-Djava.net.preferIPv4Stack=true''-Djava.util.logging.config.file=etc/karaf/java.util.logging.properties''-Dkaraf.base=.''-Dkaraf.data=../sonatype-work/nexus3''-Dkaraf.etc=etc/karaf''-Dkaraf.home=.''-Dkaraf.startLocalConsole=false''-XX:+LogVMOutput''-XX:+UnlockDiagnosticVMOptions''-XX:+UnlockDiagnosticVMOptions''-XX:+UnsyncloadClass''-XX:-OmitStackTraceInFastThrow''-XX:LogFile=../sonatype-work/nexus3/log/jvm.log''-XX:MaxDirectMemorySize=2G''-Xms1200M''-Xmx1200M''-classpath /home/nexus/nexus-3.12.1-01/.install4j/i4jruntime.jar:/home/nexus/nexus-3.12.1-01/lib/boot/nexus-main.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.apache.karaf.main-4.0.9.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.osgi.core-6.0.0.jar:/home/nexus/nexus-3.12.1-01/lib/boot/org.apache.karaf.diagnostic.boot-4.0.9.jar:/home/nexus/nexus-3.12.1-01/lib/boot/'
)
java ${JAVA_OPTS[*]} com.install4j.runtime.launcher.UnixLauncher start 9d17dc87 '''' org.sonatype.nexus.karaf.NexusMain
Now you can go into the depths.
Install4j is such a graphical Java installer. It appears that it is used for initial system installation. On the server, we do not need it, we remove.
We agree on the location of the components and data Nexus on the file system:
- put the application itself in
/opt/nexus-<version>
- for convenience, create a symbolic link
/opt/nexus -> /opt/nexus-<version>
- we will place the script instead of the original one
/opt/nexus/bin/nexus
- All data of our Nexus will be on a separate file system mounted as
/data/nexus
The creation of directories and links is the lot of configuration management systems (for everything about all 5-10 lines in Ansible), so we will leave this task to system engineers.
Let our script change the working directory to start when we start /opt/nexus
- then we can change the paths to the Nexus components to relative ones.
The view options -Dkaraf.*
are the settings of the Apache Karaf , OSGi-container, in which, obviously, our Nexus is packed. Change the karaf.home
, karaf.base
, karaf.etc
, and karaf.data
, respectively, the placement of components using relative paths whenever possible.
Seeing that CLASSPATH consists of a list of jar-files that are in the same directory lib/
, we will replace this entire list with lib/*
(we will also have to turn off wildcard expansion with the help set -o noglob
).
We will change java
to exec java
, so that our script will not run java
as a child process (the process manager simply will not see this child process), but "replace" itself with java
( description exec ).
Let's see what happened:
#!/bin/bash
JAVA_OPTS=(
'-Xms1200M''-Xmx1200M''-XX:+UnlockDiagnosticVMOptions''-XX:+LogVMOutput''-XX:+UnsyncloadClass''-XX:LogFile=/data/nexus/log/jvm.log''-XX:MaxDirectMemorySize=2G''-XX:-OmitStackTraceInFastThrow''-Djava.io.tmpdir=/data/nexus/tmp''-Djava.net.preferIPv4Stack=true''-Djava.util.logging.config.file=etc/karaf/java.util.logging.properties''-Dkaraf.home=.''-Dkaraf.base=.''-Dkaraf.etc=etc/karaf''-Dkaraf.data=/data/nexus/data''-Dkaraf.startLocalConsole=false''-server''-cp lib/boot/*'
)
set -o noglobcd /opt/nexus \
&& exec java ${JAVA_OPTS[*]} org.sonatype.nexus.karaf.NexusMain
A total of 27 lines instead of> 400, transparent, understandable, declarative, no unnecessary logic. If necessary, this script can be easily transformed into an Ansible / Puppet / Chef template and add only the logic that is needed for a specific situation.
You can use this script as an ENTRYPOINT in the Dockerfile or call the Systemd unit file, at the same time adjusting ulimits and other system parameters there, for example:
[Unit]
Description=Nexus
After=network.target
[Service]
Type=simple
LimitNOFILE=1048576
ExecStart=/opt/nexus/bin/nexus
User=nexus
Restart=on-abort
[Install]
WantedBy=multi-user.target
Conclusion
What conclusions can be drawn from this article? In principle, it all comes down to a couple of points:
- Each system has its own purpose, i.e., it is not necessary to hammer nails with a microscope.
- Simplicity (KISS, YAGNI) rules - to implement only what is needed for this particular situation.
- And most importantly, it's cool that there are IT specialists of a different profile. Let's interact and make our IT systems easier, clearer and better! :)
Thanks for attention! I would appreciate feedback and constructive discussion in the comments.