nekoval February 4, 2011 at 12:58

Java Logging: A Nightmare Story

Introduction

The thorny and tortuous path of the Java platform to the correct way of writing lines to log files. The history of logging in Java is quite informative in terms of studying the features of Open Source, including its interaction with corporations and individual programmers. I’m going to talk as much as possible about the history of Java logging, as well as where it all came to and how to live on. My analysis of the situation will be quite subjective about the reason that logging is always a matter of taste, and my tastes have formed my own, adults. I think that it will be cognitive not so much in terms of some technical features of the entire zoo logging frameworks, but in terms of the policies and psychology of developers in the Open Source model.

Start

It is clear that any logging library should allow at least printing a line to the console / to the log file.

In the beginning was, of course System.err.println. In addition, the first version of the Servlet API included a function log(however, quite primitive).

One of the options for more advanced solutions in 1999 was the Avalon project (and subprojects called Excalibur and Fortress), which in addition to DI services offered the LogEnabled interface. In the component that LogEnabled declared itself, an object of type Logger was injected (I use this word instead of “injected” to emphasize its connection with DI), where it was possible to write: a) lines b) exceptions. This approach at that time seemed fresh and innovative, but from the modern point of view it is pure idiocy and over-engineering. There is no point in using DI for logging, and a static instance of this Logger itself would suit everyone. In Avalon, I had to think about where to save this damned Logger and what to do if the class does not use DI (i.e. it is not controlled by the container), and I really want to log in it.

Around 1999, a new generation library appears - log4j. The prototype of the library was developed by IBM (back in the era when the blue giant was trying to squeeze Java into OS / 2), then ASF picked up the baton. The product was already much more thought out and tested on real needs. In general, I must say that server applications in Java by that time were only about a year old, and logging was always in demand on the server. During this time, the Java community began to gradually understand what and how they needed.

log4j divided the concept of a logger or category (i.e., the area of the application that wants to write to the log), the actual log entries that are carried out by the so-called appenders , and the formatting of entries ( layout) The log4j configuration determines which appenders are attached to which categories and the log level messages fall into each appender.

The cornerstone of log4j is the hierarchy of categories. For example, you can log all messages from org.hibernateand mute all of org.hibernate.type. After some time, de facto, the practice of matching hierarchy of categories and package hierarchy in the application was established.

The hierarchy of categories allows you to quite effectively cut off unnecessary messages, so log4j worked extremely smartly. By the way, the principle thing for loggers is not so much the recording speed as the filtering speed of the unnecessary (and usually unnecessary more than 90%) and formatting.

The principles laid down in log4j have been quite successfully ported to other languages: log4cxx, log4net (and fresh cub - log4php). The standard logging package in Python 2.x is a redesigned log4j (with a small addition of other libraries).

So, we summarize. Successful architecture, clear configuration scheme, fail-safe principle - why not include such a wonderful library in the platform?

Java Logging API

In fact, everything turned out strange. IBM, in the depths of which log4j emerged, turned out to be pretty quick in questions of forming the new JSR47 (Java Logging API). In particular, Comrade Graham Hamilton, who is responsible for JSR47, decided not to take log4j as the basis , but the original IBM logging toolkit . Moreover, the logging toolkit was used to its fullest: not only the names of all the main classes coincided, but also their implementation; They tried to finish the code as little as possible, apparently in order to catch the next release of the platform. However, conceptually it was very similar to log4j, only instead of appenders it was called handlers, and instead of layout there was formatter.

Since the main purpose of JSR47 is to define an API, not an implementationThere were only 4 available (by default in the platform) output tools (in log4j more than 10), and the formatters were so poor that they had to make their formatter almost immediately, since there weren’t enough ready-made ones. JSR47 suggested using the configuration in the form .properties, and it was noted in brackets that not everything can be described in the file. Thus, when the configuration was complicated, the programmer unexpectedly found out that it was again necessary to write code, because in the form, .propertiesits configuration is unrealizable.

This is not to say that JSR47 loses in performance. In some places, he overtook log4j by maintaining in memory a special representation of his configuration (which, incidentally, at the same time complicated this very configuration). However, as it turned out, JSR47 compulsorily collected the so-called Caller Information, that is, "where does this message come from." Obtaining Caller Information is a rather expensive operation; it proceeds using the Native code. Experienced uncles from log4j knew this, so they provided this option with the proviso “better not turn it on.”

Log4j developers make an open petition, where they demanded to “remove the JSR47 from the assembly line”, while it was not yet part of the platform. The petition was signed by more than 100 people ... However, it was already too late. The next release of JDK was approved and the platform swept into the future with a rudimentary java.util.logging, or abbreviated JUL. The new logging was so undeveloped and inconvenient that they decided to use it only in a few application servers (among them Resin and Jetty). Sun, however, responded to the petition and most of the major problems of the original JSR47 were gradually eliminated. Nevertheless, these manipulations were more like installing supports to a wooden bridge, which by no means would make this bridge reinforced concrete. Log4j developers made a curtsytowards Sun, noting, however, that the degree of curvature of the JUL is still quite high. Among other things, the JDK 1.4 license did not allow the use of log4j as an implementation of JUL. The last train for log4j is gone.

Unable to support a large number of log writers (i.e. handlers), JUL showed off by defining an incredible number of logging levels. For example, for debugging messages there were already 3 levels - FINE, FINER and FINEST. Seeing all this, the developers often did not understand at all which of the three levels, damn it, should be used.

The Java community was completely disoriented by the emergence of “standard” logging in parallel with the popular, stable and developing log4j. No one understood which of the two were tenants. There were often situations when several libraries were assembled in the project, each of which used its own logging and its own settings, recording their log files completely differently.

Of course, the community tried to fix this problem. The wrapping epidemic has begun . Or, I would even say, a pandemic.

Wrapper hell

When you connect several libraries and try to combine their logs into a single whole (and the code cannot be modified), this will be called Adapter. JUL adapters were written in log4j and vice versa. Unfortunately, functional adapters are the “least common multiple”. Even when context support (NDC and MDC) appeared in log4j, it was lost during transfusion in JUL. Worse, JUL only worked since JDK 1.4, while an incredible number of enterprise applications still sat at 1.3. As a result, the community became obsessed with the idea of creating a “de facto common standard” that everyone would use together and that would work always and everywhere.

Around 2002, a project called commons-logging (JCL = Jakarta Commons Logging) emerged from the Jakarta group. In fact, it was a wrapper for all the logging tools that existed at that time. It was proposed to write applications in such a way that they turned to a wrapper (an interface called Log), which selected a “suitable” logging system and connected to it itself. The wrapper was functionally poor and didn’t make any additions to the existing logging tools.

How was the appropriate logging system automatically selected? But this is the most interesting. Firstly, one could explicitly set it by placing a specialcommons-logging.propertiesfile somewhere in CLASSPATH. Secondly, through a system property (which, obviously, no one will do). Thirdly, if log4j was found somewhere in CLASSPATH, then it was automatically activated. Implementations of all other libraries were searched in the same way, the first one was always connected.

Handsomely! Well, that is, it would be nice if all the software in the world uses commons-logging. Then you could safely collect the JARs, put them in the application server, and there JCL will pick up the logging of this application server and voila!

In fact, as it turned out, a bunch of software usually uses "their developer’s favorite logging". This means that a completely arbitrary library can, in the form of a dependency, pull up, for example, log4j, which thus falls into CLASSPATH and unexpectedly switches JCL to use log4j. Even worse with commons-logging.properties. If any activist thought of pushing it into his JAR, then when you connect this JAR, you yourself understand, write is lost. A particular piquancy of the situation was given by the fact that it was completely incomprehensible from which JAR the infection came. Sometimes it helped to sort through all the JARs in alphabetical order. Sometimes a tambourine.

The complete unpredictability of the choice of logging turned out to be the main and very fun feature of JCL. The log4j group burst into an angry articleThink again before adopting the commons-logging API , where it suggested stopping the epidemic and focusing on refining the existing solution - log4j.

Unfortunately, it was too late. Hundreds and then thousands of libraries were transferred from Jakarta to commons-logging. Among them were Hibernate, Spring, Tomcat. After that, numerous users of these libraries were swept by a wave of problems, generally described as ClassLoader hell . Application servers use a rather complex hierarchy of ClassLoaders, often with serious deviations from the J2EE standard. Under these conditions, sometimes the JCL is initialized twice, and incorrectly, leading to completely mystical stack traces, which do not even allow us to suspect that the problem is in the log wrapper.

Why, in fact, did Open Source work in such a strange way, giving birth to this perversion? Why developers did not dare to just take and use another mature and popular Open Source product - log4j? The point here is, perhaps, in some inertia of the community, which is used to following either ASF (and the Jakarta group that generated this nightmare is part of ASF) or Sun. As soon as a critical mass of projects using JCL is formed, everyone else (and not the dumbest people, right, Gavin King?) Starts using JCL (because Apache is awesome!). This is generally reminiscent of the Brownian movement, where brands such as Apache or Sun are able to create low-pressure areas where millions of developers are rushing. For JCL, the “success story” is described on Rod Waldhoff’s blog. (one of the developers of the so-called Jakarta Commons) in 2003.

A new round of progress

So, somewhere in 2004 we have in the kit:

Stable and functionally developed log4j
Dull java.util.logging
Problem commons-logging
A few small loggers not worth mentioning

Note that conservative moods prevailed in the log4j project at this time. Particular attention was paid to compatibility with older JDKs. It seems that the development of a new branch log4j - 1.3.x. This version is a kind of compromise solution: yes, I want new functionality, yes, I want to maintain backward compatibility, yes, we will try to please both ours and yours. Meanwhile, on the way JDK 1.5 with varargs, JMX extensions and a bunch of other gifts. The log4j team started a brainwave. Branch 2.x is incompatible - incompatible with the main branch 1.2.x and created specifically for JDK 1.5. The Java community is looking impatient. It happens like as something. But what exactly is not to understand - log4j 2.0 is still an unattainable alpha, log4j 1.3 is wildly buggy and does not provide the promised drop-in compatibility. And only branch 1.2 is still stable and alive and healthy, jumping in a few years - attention! - from version 1.2.6 to 1.2.12.

Somewhere in 2006, one of the founding fathers of log4j - Ceki Gülcü - decides to leave the rapidly decaying team. So the next “wrapper of everything” called SLF4J (Simple Logging Facade for Java) is born. Now this is a wrapper around: log4j, JUL, commons-logging, and a new logger called logback. As you can see, progress quickly reached the stage of "wrapper around the wrapper." It is easy to predict that, according to the same scheme, the number of wrapped libraries will grow as a factorial.However, SLF4J also offers other other twists. These are special binary adapters: from log4j to SLF4J, from commons-logging to SLF4J and the like. Such adapters are made for code whose sources are not available; at the same time, they should replace the original JARs of the log libraries. I can’t imagine what kind of porridge this will form, but if you really want to, then you can.

For all my hatred of wrappers, honestly, SLF4J is a well-made product. All the shortcomings of the predecessors were taken into account. For example, instead of shamanistic dances with class searches in CLASSPATH, a more reliable scheme was invented. Now the whole wrapper is divided into two parts - the API (which is used by applications) and the Implementation, which is represented by separate JAR files for each type of logging (for example slf4j-log4j12.jar,slf4j-jdk14.jaretc.). Now it’s enough just to connect the necessary Implementation file to the project, after which - oops! all project code and all libraries used (provided that they access the SLF4J API) will log in the right direction.

Functionally, SLF4J supported all modern bells and whistles such as NDC and MDC. In addition to the actual call wrapping, SLF4J offered a small but useful bonus when formatting strings. The bonus here is as follows. In code, you often have to print constructions of the form

log.debug("User " + user + " connected from " + request.getRemoteAddr());

In addition to actually printing a string, there will be an implicit conversion user.toString()with subsequent string concatenation. All would be nothing. In debug mode, the execution speed does not really bother us. However, even if we set the level, say, in INFO, it turns out that the construction of the string will still occur! No miracles: the line is constructed before the call log.debug, so log4j has no way to control this somehow. If you imagine that this is log.debugplaced in some kind of critical inner cycle ... in general, you cannot live like that. The developers of log4j suggested framing the debugging code like this:

if (log.isDebugEnabled()) {
    log.debug("User " + user + " connected from " + request.getRemoteAddr());
}

It’s not good. In theory, the logging library itself should take care of all these problems. This problem was simply the Achilles heel of log4j. The developers reacted sluggishly to kicks, saying that now you can add an object (exactly one!) To logging calls, and even describe how this object will be logged using the interface ObjectRenderer. By and large, all these were excuses and half measures.

SLF4J was not squeezed by the framework of compatibility with older versions of the JDK and API, so I immediately proposed a more elegant solution:

    log.debug("User {} connected from {}", user, request.getRemoteAddr());

In general, everything is simple. In this line {}, these are links to parameters that are passed separately. Converting parameters to a string and final formatting of the log record occurs only when the DEBUG level is set. You can pass a lot of parameters. Works! No need to write framing ifand other stupidity!

In parentheses, it should be noted that this feature was also completely unexpectedly implemented by the Groovy language, where there is the concept of GString, i.e. view string

"User ${user} connected from ${request.getRemoteAddr()}"

, which is implicitly associated with several context variables (here it is user, request), and the calculation of the string occurs in a deferred manner . This is very convenient for such log libraries as log4j - you can get it into the GString input, and then either throw it away without calculating it or convert it to a normal (static) string - String.

In short, the SLF4J was made competently, with a reserve for the future. This caused a serious increase in its popularity among the community: now SLF4J is used by such significant projects as Jetty, Hibernate, Mina, Geronimo, Mule, Wicket, Nexus ... in general, almost all the losers who once hung on commons-logging switched to SLF4J . I wonder what prevented the improvement of commons-logging to the desired state many years ago? But these are the realities of Open Source - the development of software in it is more revolutionary than evolutionary.

Simultaneously with the SLF4J, a completely new logger was brought to the table - Logback. It was made by a man who ate a dog on logging, and indeed turned out to be a really good product. Logback was originally sharpened under JDK 1.5+, in one fell swoop getting rid of all the senile backward compatibility diseases inherent in the log4j project. And that means varargs, java.util.concurrentand other delights . For example, due to the built-in runtime-filtering system, you can change the logging level depending on the user session, scatter users into different log files and so on and so forth.

I will throw the mustard in the idyll painted by the author. Most of these features can be implemented as additional appenders for log4j. It’s more difficult to bend and file the configuration, but it’s a fact that it’s not necessary to switch to a new logger for this. Thus, all Logback advertised chips are convenient, but not unique.

As for the community, it treats Logback with caution. Firstly, in a few years he got to version 0.9.x, and this scares some programmers. Secondly, Logback is neither under the Apache umbrella, nor in the scope of the Sun. It bothers scrupulous people. Thirdly, the author needs to eat, so for some add-ons to Logback and support, it requires money. This sometimes scares students off. Among other things, Logback has a rather complex dual license (LGPL / EPL), while log4j has a universal Apache license. For libraries and generally redistributable software, licensing is a very subtle point.

By and large, Logback today is the pinnacle of evolution. In addition to Logback, a dozen new logging libraries have appeared, but with high probability none of them will survive. To summarize, the situation at the moment is as follows:

log4j - use hooked on it initially and not seeing the need for transition.
JUL is a quietly dying standard. All who initially tried to use it, move to Logback.
commons-logging - usually involved in legacy libraries that are very afraid of causing inconvenience to the user by moving to something better.
SLF4J - very popular in libraries. Many moved to it, unable to bear the horrors of commons-logging
Logbacks are usually modern high-performance servers that log4j does not suit.

I have already said that the Open Source community tends to flock to "centers of gravity." Now such a center of gravity is rather SLF4J due to its "universality". The relative popularity of SLF4J to some extent guarantees the emergence of new wrappers. The number of projects using SLF4J is already sufficient to accumulate a “critical mass”. Logback (the same author, mind you) has no such critical mass. (By the way, log4j still promises us golden mountains and version 2.0, but things are still there.) I think if Logback pacifies his pride and moves to Apache, his position will improve greatly.

Conclusion

It is interesting to look at the history of the issue from the angle of psychology of programmers. Indeed, in principle, all this is spiral (and it seems to be progressive!) movement - endless "reinvent the wheel". That is, of the two options "modify the existing" and "make your own" always chose the second. Therefore, not one of the projects mentioned has broken out into unconditional leaders (into the very de facto standards). Instead, the developers were “chopped up” at different times at different projects and acted separately, instead of acting together. Although it is not a fact that all authors could work in one team. There were political moments (remember how Graham Hamilton loved IBM), and just trivial quarrels in the team. The desire of Jakarta Commons participants to provide the community with “freedom of choice” generally turned into a long “epidemic of wrappers” for the community.

In general, all these vices are typical of an open community. This more than 10-year history also shows how mistakenly the opinion is now that Sun seems to be deciding something in the Java community. We see that many things happened contrary to Sun and independently of Sun. In a word, I wonder how it will go further. I’m sure of one thing - projects come and go, people don’t change :)

Tags: