obie February 19, 2013 at 12:06

Soft links to guard available memory or how to save memory correctly

All java developers, sooner or later, encounter the notorious OutOfMemoryError error.

After this meeting, we begin to more carefully treat the used memory, save it. Starting with version 1.2, the java.lang.ref. * Package with the classes SoftReference, WeakReference, PhantomReference appeared in Java. Next, I will tell you how these classes will help in the fight against OutOfMemoryError. And more interestingly, I will give real examples of their use. Let's get started.

General description

First, a little general theory. Recall, in general terms, how the Garbage Collector (hereinafter GC) works. Without going into details, the algorithm is simple: when you run the collector, the virtual machine recursively finds, for all threads, all available objects in memory and marks them in some way. And in the next step, GC deletes all unlabeled objects from memory. Thus, after cleaning, only those objects that may be useful to the program will be in memory. Move on.
There are several kinds of links in Java. There is StrongReference - these are the most ordinary links that we create every day.

	StringBuilder builder = new StringBuilder();

builder this is a strong-reference to the StringBuilder object.
And there are 3 "special" types of links - SoftReference, WeakReference, PhantomReference. In fact, there is only one difference between all types of links - the behavior of the GC with the objects to which they refer. We will discuss in more detail the features of each type of links later, but for now, the following knowledge will suffice:

SoftReference - if the GC sees that the object is accessible only through a chain of soft links, then it will delete it from memory. Later. Maybe.
WeakReference - if the GC sees that the object is accessible only through a chain of weak links, then it will delete it from memory.
PhantomReference - if the GC sees that the object is accessible only through a chain of phantom links, then it will delete it from memory. After several starts of GC.

If it’s not yet clear what the difference is, then don’t worry, soon everything will fall into place. The little things are in the details, and the details will come next.
These 3 types of links are inherited from one parent - Reference, from which they actually take all their public methods and constructors.

StringBuilder builder = new StringBuilder();
SoftReference softBuilder = new SoftReference(builder);

After completing these two lines, we will have 2 types of links to 1 StringBuilder object:

builder - strong link
softBuilder - soft-link (formally this is a strong-link to a soft-link, but for simplicity I will write a soft-link)

And if, at run time, the builder variable becomes unavailable, but the link to the object referenced by softBuilder is still available and GC -> starts, then the StringBuilder object will be marked as accessible only through a chain of soft links.
Consider the available methods:
softBuilder.get () - returns a strong reference to a StringBuilder object in case the GC has not deleted this object from memory. Otherwise, null will be returned.
softBuilder.clear () - removes the link to the StringBuilder object (that is, there are no more soft links to this object)
All the same works for WeakReference and PhantomReference. True, PhantomReference.get () will always return null, but more on that later.
There is also such a class - ReferenceQueue. It allows you to track when the GC determines that the object is no longer needed and can be deleted. This is where the Reference object gets after the object to which it refers is deleted from memory. When creating the Reference, we can pass in the ReferenceQueue constructor, in which links will be placed after deletion.

SoftReference Details

GC Features

So still, how does the GC behave when it sees that an object is accessible only through a chain of soft links? Let's look at the work of the GC in more detail:
And so, the GC began its work and goes through all the objects on the heap. If the object on the heap is Reference, then GC places this object in a special queue in which all Reference objects are located. After passing through all the objects, the GC takes the Reference objects queue and for each of them decides to delete it from memory or not. How exactly the decision to delete an object is made depends on the JVM. But the general contract is as follows: GC is guaranteed to remove from the heap all objects accessible only by soft link before throwing OutOfMemoryError .
SoftReference is our mechanism for caching objects in memory, but in a critical situation, when the available memory runs out, GC will remove unused objects from memory and thereby try to save the JVM from shutting down. Isn't it wonderful?
This is how Hotspot decides to remove SoftReference: if you look at the implementation of SoftReference, you can see that there are 2 variables in the class - private static long clock and private long timestamp. Each time you start GC, it sets the current time to the clock variable. Each time a SoftReference is created, the current clock value is written to the timestamp. timestamp is updated every time the get () method is called (every time we create a strong object reference). This allows you to calculate how long a soft link exists after the last call to it. We denote this interval by the letter I. Let us denote by the letter F the amount of free space on the heap in MB (megabytes). By the constant MSPerMB we denote the number of milliseconds how many soft links will exist for each free megabyte in the heap.
Then everything is simple, if I <= F * MSPerMB, then we do not delete the object. If more then delete.
To change MSPerMB, use the -XX switch : SoftRefLRUPolicyMSPerMB . The default value is 1000 ms, which means that the soft link will exist (after the strong link has been deleted) 1 second for every megabyte of free memory on the heap. The main thing is not to forget that these are all approximate calculations, since in fact the soft link will be deleted only after starting the GC.
Note that to remove an object, I must be strictly larger than F * MSPerMB. It follows that the created SoftReference will survive at least 1 GC launch. (* if it is not clear why, then this will remain your homework).
In the case of IBM's VM, linking the soft link's lifetime does not go to time, but to the number of surviving GC launches.

Application

The main SoftReference bun is that the JVM itself monitors whether an object should be deleted from memory or not. And if there is little memory left, then the object will be deleted. This is exactly what we need when caching. Caching using SoftReference can be useful in systems sensitive to the amount of available memory. For example, image processing. The first application example will be a little imaginary, but indicative:
Our system is engaged in image processing. Suppose we have a huge image that is located somewhere in the file system and this image is always static. Sometimes the user wants to combine this image with another image. Here is our first implementation of such a concatenation:

public class ImageProcessor {
	private static final String IMAGE_NAME = "bigImage.jpg";
	public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) {
		InputStream defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME);                
		// calculate and return concatenated image
	}            
}

There are many shortcomings in this approach, but one of them is that we must download an image from the file system each time. And this is not the fastest procedure. Let's then cache the downloaded image. Here is the second version:

public class CachedImageProcessor {
	private static final String IMAGE_NAME = "bigImage.jpg";
	private InputStream defaultImage;           
	public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) {
		if (defaultImage == null) {
			defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME);
		}                
		// calculate and return concatenated image
	}            
}

This option is already better, but there is still a problem. The image is large and takes up a lot of memory. Our application works with many images, and the next time the user tries to process the image, OutOfMemoryError can easily fall. And what can be done about it? It turns out that we need to choose either speed or stability. But we know about the existence of SoftReference. This will help us continue to use caching, but in critical situations, unload them from the cache to free memory. And even so, we do not need to worry about detecting a critical situation. This is what our third implementation will look like:

public class SoftCachedImageProcessor {
	private static final String IMAGE_NAME = "bigImage.jpg";
	private SoftReference defaultImageRef = new SoftReference(loadImage());
	public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) {                
		if (defaultImageRef.get() == null) {        //  1
			defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME);
			defaultImageRef = new SoftReference(defaultImage);
		}        
		defaultImage = defaultImageRef.get();        //  2        
		// calculate and return concatenated image
	}            
}

This version is not perfect, but it shows how easy we can control the size occupied by the cache, or rather, assign control to the virtual machine. The danger of this implementation is as follows. In line No. 1 we do a null check, in fact we want to check whether the GC deleted data from memory or not. Let's say that he didn’t delete it. But before executing line No. 2, the GC can start working and delete the data. In this case, the result of line number 2 will be defaultImage = null. To safely check the existence of an object in memory, we need to create a strong link, defaultImage = defaultImageRef.get (); Here's what the final implementation will look like:

public class SoftCachedImageProcessor {
	private static final String IMAGE_NAME = "bigImage.jpg";
	private SoftReference defaultImageRef = new SoftReference(loadImage());;
	public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) {
		defaultImage = defaultImageRef.get();
		if (defaultImage == null) {
			defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME);
			defaultImageRef = new SoftReference(defaultImage);
		}                
		// calculate and return concatenated image
	}            
}

Let's go further. java.lang.Class also uses SoftReference for caching. It caches data about the constructors, methods, and fields of the class. It is interesting to see what exactly they cache. After you decide to use SoftReference for caching, you need to decide what to cache. Let's say we need to cache List. We can use as List so and softreference. The second option is more acceptable. It must be remembered that GC applies specific logic when processing Reference objects, and memory will be freed up faster if we have 1 SoftReference and not a list of them. This is what we see in the implementation of Class - the developers created a soft link to an array of constructors, fields and methods. If we talk about performance, it is worth noting that often, erroneously, people use WeakReference to build a cache where it is worth using SoftReference. This results in poor cache performance. In practice, weak links will be quickly deleted from memory as soon as strong links to the object disappear. And when we really need to pull an object from the cache, we will see that it is no longer there.
Well, another example of using a cache based on SoftReference. Google Guava has a MapMaker class. He will help us build a ConcurrentMap in which there will be the following feature - the keys and values in the Map can be wrapped in WeakReference or SoftReference. Suppose in our application there is data that the user can request and this data is obtained from the database by a very complex query. For example, this would be a user's shopping list for the past year. We can create a cache in which the values (shopping list) will be stored using soft links. And if there is no value in the cache, then you need to pull it from the database. The key is the user ID. Here's what the implementation might look like:

ConcurrentMap> oldProductsCache = new MapMaker().softValues().
           .makeComputingMap(new Function>() {
                   @Override
                   public List apply(User user) {
                     return loadProductsFromDb(user);
                   }
             });

Weakreference

GC Features

Now we will consider in more detail what WeakReference is. When the GC determines that an object is accessible only through weak links, this object is “immediately” deleted from memory. It is worth remembering about ReferenceQueue and following the procedure for deleting an object from memory. Let me remind you that for WeakReference and SoftReference the algorithm for getting into the ReferenceQueue is the same. So, GC started and determined that the object is accessible only through weak links. This object was created like this:

StrIngBuilder AAA = new StringBuilder();
ReferenceQueue queue = new ReferenceQueue();
WeakReference weakRef = new WeakReference(AAA, queue);

First, GC will clear the weak link, i.e. weakRef.get () - will return null. Then weakRef will be added to the queue and accordingly queue.poll () will return a link to weakRef. That's all I wanted to write about the features of the GC with WeakReference. Now let's see how this can be used.

Application

Well of course WeakHashMap. This is a map implementationwhich stores the key using a weak link. And when the GC removes the key from memory, then the entire record from the Map is deleted. I think it’s not difficult to understand how this happens. When adding a new pair <key, value>, WeakReference is created for the key and ReferenceQueue is passed to the constructor. When the GC removes the key from memory, the ReferenceQueue returns the corresponding WeakReference for that key. After that, the corresponding Entry is removed from the Map. Everything is pretty simple. But I want to pay attention to some details.

WeakHashMap is not intended to be used as a cache. WeakReference is created for the key and not for the value. And the data will be deleted only after the program has no strong-links to the key and not to the value. In most cases, this is not what you want to achieve by caching.
Data from WeakHashMap will not be deleted immediately after the GC detects that the key is accessible only through weak links. In fact, the cleanup will occur the next time you use WeakHashMap.
First of all, WeakHashMap is intended for use with keys in which the equals method checks the identity of objects (uses the == operator). Once access to the key is lost, it can no longer be recreated.

Well, then in what cases is it convenient to use WeakHashMap? Suppose we need to create an XML document for the user. The design of the document will be handled by several services that will receive org.w3c.Node into which they will add the necessary elements. Also, for services, you need a lot of information about the user from the Database. We will store this data in the UserInfo class. The UserInfo class takes up a lot of memory space and is relevant only for building a specific XML document. Caching UserInfo does not make sense. We only need to associate it with a document and it is advisable to delete it from memory when the document is no longer used by the program. All we need to do:

private static final NODE_TO_USER_MAP = new WeakHashMap();

Creating an XML document will look something like this:

Node mainDocument = createBaseNode();
NODE_TO_USER_MAP.put(mainDocument, loadUserInfo());

Well, here is the reading:

UserInfo userInfo = NODE_TO_USER_MAP.get(mainDocument);
If(userInfo != null) {
	// …
}

UserInfo will be in WeakHashMap until the GC notices that only weak links remain on the mainDocument.
Another example of using WeakHashMap. Many people know about the String.intern () method. So using WeakReference you can create something similar. (Let us not discuss, within the framework of this article, the appropriateness of this solution, and accept the fact that this solution has some advantages compared to intern ()). So, we have sooo many lines. We know that lines are repeated. To save memory, we want to reuse existing objects, rather than creating new objects for the same lines. Here's how WeakHashMap can help us with this:

private static Map> stringPool = new WeakHashMap>;
public String getFromPool(String value) {
	WeakReference stringRef = stringPool.get(value);
	if (stringRef == null || stringRef.get() == null ) {
		stringRef = new WeakReference(value);
		stringPool.put(value, stringRef);
	}
	return stringRef.get();
}

And finally, I will add that WeakReference is used in many classes - Thread, ThreadLocal, ObjectOutpuStream, Proxy, LogManager. You can look at their implementation in order to understand in what cases WeakReference can help you.

Phantomreference

GC Features

There are two features of this type of link. The first is that the get () method always returns null. It is because of this that PhantomReference makes sense to use only together with ReferenceQueue. The second feature - unlike SoftReference and WeakReference, GC will add a phantom link to the ReferenceQueue after the finalize () method is executed. That is, in fact, unlike SoftReference and WeakReference, the object is still in memory.

Practice

At first glance, it is not clear how to use this type of link. In order to explain how to use them, let us first familiarize ourselves with the problems that arise when using the finalize () method: overriding this method allows us to clear the resources associated with the object. When the GC determines that the object is no longer available, then before deleting it from memory, it executes this method. Here are the problems associated with this:

GC starts unpredictably, we cannot know when the finalize () method will be executed
The finalize () methods are run in a single thread, in turn. And until this method is executed, the object cannot be deleted from memory
There is no guarantee that this method will be called. JVM can finish its work and at the same time the object will not become inaccessible.
During the finalize () method execution, a strong reference to the object can be created and it will not be deleted, but the next time the GC sees that the object is no longer available, the finalize () method will no longer be executed.

Back to the PhantomReference. This type of links, in combination with the ReferenceQueue, lets us know when an object is no longer available and there are no other links to it. This allows us to clean up the resources used by the object at the application level. Unlike finalize (), we ourselves control the process of cleaning resources. In addition, we can control the process of creating new objects. Let's say we have a factory that will return an HdImage object to us. We can control how many such objects will be loaded into memory:

public HdImageFabric {
	public static final int IMAGE_LIMIT = 10;
	public static int count = 0;
	public static ReferenceQueue queue = new ReferenceQueue();
	public HdImage loadHdImage(String imageName) {
		while (true) {
			if (count < IMAGE_LIMIT) {
				return	wrapImage(loadImage(imageName));	
			} else {
				Reference ref = queue.remove(500);
				if (ref != null) {
					count--;
					System.out.println(“remove old image”);
				}
			}
		}
	}
	private HdImage wrapImage(HdImage image) {
		PhantomReference refImage = new PhantomReference(image, queue);
		count++;
		return refImage ;
	}
}

This example is not thread safe and has other drawbacks, but it does show how PhantomReference can be used in practice.
Due to the fact that the get () method always returns null, it becomes unclear how to understand which object was deleted. To do this, you need to create your own class that will inherit PhantomReference, and which contains a certain descriptor that in the future will help determine what resources should be cleaned.
When you use PhantomReference you need to remember the following things:

The contract guarantees that the link appears in the queue after the GC notices that the object is accessible only via phantom links and before the object is deleted from memory. The contract does not guarantee that these events will occur one after another. In reality, any time can elapse between these events. Therefore, do not rely on PhantomReference to clean up critical resources.
Executing the finalize () method and adding the phantom link to the ReferenceQueue is performed on different GC launches. Therefore, if the finalize () method is overridden in an object, then 3 GC starts are required to delete it, and if the method is not overridden, then at least 2 GC starts

.

As a conclusion, I want to say that java.lang.ref. * Gives us good opportunities for working with JVM memory and you should not ignore these classes, they can help us a lot. Their use is associated with a large number of errors, and you need to be extremely careful to achieve the desired result. But did these difficulties once stop us? That's all. Thanks to everyone who read to the end. I will try to answer in the comments those questions that I could not disclose in this article.

Tags:

Soft links to guard available memory or how to save memory correctly

General description

SoftReference Details

GC Features

Application

Weakreference

GC Features

Application

Phantomreference

GC Features

Practice

Also popular now: