goodguyfromil November 18, 2012 at 02:59

3 billion entries in Java Map on 16 GB RAM

From the sandbox

One rainy evening, I thought about memory management in Java and how to effectively use Java collections. I did a simple experiment, how many records can I insert map with 16 GB of RAM?

The purpose of this experiment is to investigate the internal memory costs of managing collections. So I decided to use small keys and small values. All tests were carried out on 64-bit Linux Kubuntu 12.04. JVM 64bit Oracle Java 1.7.0_09-b05 with HotSpot 23.5-b02. Compressed pointers (-XX: + UseCompressedOops) are enabled by default on this JVM.

First test with java.util.TreeMap. Inserts a number into map, works until memory runs out. The JVM parameters for this test are Xmx15G

import java.util. *;
Map m = new TreeMap ();
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1000000 == 0) System.out.println ("" + counter);
}

This example ended in 172 million. Towards the end, the process slowed down due to the aggressive activities of the garbage collector. In the second run, I replaced TreeMap with a HashMap, it ended with 182 million.

By default, Java collections are not super efficient. So let's try more memory-optimized: I chose LongHashMap from MapDB, which uses primitive long keys and is optimized to have a small amount of memory. JVM settings again -Xmx15G

import org.mapdb. *
LongMap m = new LongHashMap ();
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1000000 == 0) System.out.println ("" + counter);

This time the counter stopped at 276 million records. Again towards the end, the process slowed down due to the aggressive activity of the garbage collector.

This seems to be the limit for dynamic collections, garbage collection brings additional costs.

It is time to roll out real weapons :-). We can always get away from dynamic memory, where the garbage collector will not see our data. Let me introduce MapDB to you, it provides TreeMap and HashMap with database support. It supports various storage modes, including an option that is not in dynamic memory.

So let's run the previous example, but now Map is without dynamic memory. Firstly, it is a few lines to configure and open the database, direct access to memory with transactions off. The next line creates a new Map in the database.

import org.mapdb. *

DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.make ();

Map m = db.getTreeMap ("test");
for (long counter = 0 ;; counter ++) {
m.put (counter, "");
if (counter% 1000000 == 0) System.out.println ("" + counter);
}

This Map is not in dynamic memory, so we need different JVM settings: -XX: MaxDirectMemorySize = 15G -Xmx128M. Memory ran out at 980 million.

But MapDB can be done better. The problem in the previous example is fragmentation, the b-tree node resizes on each insert. The solution is to lash the tree nodes before they are inserted. This reduces recording fragmentation to a minimum. change the DB configuration:

DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.make ();

Map m = db.getTreeMap ("test");

Memory ran out at 1.738 million records, after 31 minutes.

MapDB can be done even better by increasing the size of the node in the tree from 32 to 120 entries and enable transparent compression:

DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.compressionEnable ()
.make ();

Map m = db.createTreeMap ("test", 120, false, null, null, null);

This example completes memory on 3.315 million records. This is slower due to compression, but it still completes within hours. I could probably do some optimizations (special serializers) and increase the number of entries, somewhere around 4 billion.

Maybe you could ask how all these notes fit in there. The answer is delta-key compression. Also, inserting ordered keys into a B-Tree is a better scenario and MapDB is slightly optimized for it. Worst case scenario inserts keys randomly.

delta-key compression by default on all examples. In this example, I activated Zlib compression.

DB db = DBMaker
.newDirectMemoryDB ()
.transactionDisable ()
.asyncFlushDelay (100)
.make ();

Map m = db.getTreeMap ("test");

Random r = new Random ();
for (long counter = 0 ;; counter ++) {
m.put (r.nextLong (), "");
if (counter% 1000000 == 0) System.out.println ("" + counter);
}

But even with random order, MapDB will be able to store 651 million records, almost 4 times more than based on regular dynamic collections.

This little exercise doesn't have many goals. This is just one way to optimize MapDB. Most surprisingly, the insertion speed was excellent and MapDB can compete with collections in memory.

github.com/jankotek/jdbm3

Tags:

3 billion entries in Java Map on 16 GB RAM

Also popular now: