C # object pool implementation

Good day!
In this article I will tell you how to improve the performance of a multi-threaded (and not only) C # application, in which objects are often created for “one-time” work.
A little about multithreading, non-blocking synchronization, the use of the built-in profiler in VS2012 and a small benchmark.

Introduction


An object pool is a generative design pattern, a set of initialized and ready-to-use objects.
Why is it needed? In a nutshell, to improve performance when initializing a new object is a big expense. But it’s important to understand that the built-in garbage collector in .NET copes with the destruction of light short-lived objects, so the applicability of the pool is limited by the following criteria:
  • objects that are expensive to create and / or destroy (examples: sockets, flows, unmanaged resources);
  • cleaning objects for reuse is cheaper than creating a new one (or it costs nothing);
  • very large objects.

I’ll explain a little the last point. If your object occupies 85,000 bytes or more in memory, it falls into a large object heap in the second generation of garbage collection, which automatically makes it a "long-lived" object. Add to this fragmentation (this heap does not shrink) and we get the potential problem of lack of memory with constant allocation / destruction.
The idea of ​​the pool is to organize the reuse of “expensive” objects using the following scenario:

var obj = pool.Take(); // нам потребовался рабочий объект. Вместо создания мы запрашиваем его из пула
obj.DoSomething();
pool.Release(obj); // возвращаем ("освобождаем") объект в пул, когда он становится не нужным

Problems with this approach:
  • after working with an object, it may be necessary to reset it to its initial state so that previous use does not affect subsequent ones;
  • the pool should provide thread safety, because it is used, as a rule, in multithreaded systems;
  • the pool should handle the situation when there are no objects available for issuance in it.

Given these problems, the requirements for a new class were compiled:
  1. Type safety of the pool at the compilation stage.
  2. The pool works with any classes, including third-party ones.
  3. Simple use in code.
  4. Auto-allocation of new objects when there is a shortage, their user initialization.
  5. Limit the total number of selected objects.
  6. Auto-cleaning an object when it returns to the pool.
  7. Thread safety (preferably, with minimal synchronization costs).
  8. Support for multiple pool instances (from here at least the simplest control is to ensure that objects return to their pools).


Solving Usage Issues


In some implementations, in order to support the object pool, the object must implement the IPoolable interface or similar, but my task was to ensure that the pool works with any classes, even if they are closed for inheritance. To do this, the PoolSlot generic shell was created, which inside contains the object itself and a link to the pool. The pool itself is an abstract generic class for storing these slots, with two unrealized methods for creating a new object and cleaning the old one.

public abstract class Pool
{
	public PoolSlot TakeSlot() {...} // операция "взять из пула"
	public void Release(PoolSlot slot) {...} // операция "вернуть из пула"
	/* ... */
	// методы для переопределения:
	protected abstract T ObjectConstructor(); // создание нового объекта, готового для использования
	protected virtual void CleanUp(T item) {} // очистка использованного объекта, вызываемая автоматически
}

Using the SocketAsyncEventArgs Class Example

Pool definition
public class SocketClientPool : Pool
{
	private readonly int _bufferSize;
	public SocketClientPool(int bufferSize, int initialCount, int maxCapacity)
		: base(maxCapacity)
	{
		if (initialCount > maxCapacity)
			throw new IndexOutOfRangeException();
		_bufferSize = bufferSize;
		TryAllocatePush(initialCount); // в базовом классе объявлено несколько protected-методов; этот создает и помещает в пул указанное число новых объектов
	}
	protected override SocketAsyncEventArgs ObjectConstructor()
	{
		var args = new SocketAsyncEventArgs();
		args.SetBuffer(new byte[_bufferSize], 0, _bufferSize);
		return args;
	}
	protected override void CleanUp(SocketAsyncEventArgs @object)
	{
		Array.Clear(@object.Buffer, 0, _bufferSize);
	}
}

Use in code:

var pool = new SocketClientPool(1024, 5, 10); // при старте сервера, например
/* ...где-то в коде... */
var slot = pool.TakeSlot(); // взятие слота с объектом
var args = slot.Object; // оригинальный объект для каких-либо действий
pool.Release(slot); // возвращение обратно в пул

Or even like this:

using(var slot = pool.TakeSlot()) // класс PoolSlot реализует IDisposable
{
	var args = slot.Object;
}

Those who are familiar with the asynchronous .NET model and / or with asynchronous methods of the same Socket class know that using such an implementation is difficult, because the Socket.XxxAsync methods accept exactly SocketAsyncEventArgs, and not some PoolSlot. To call a method, it does not matter, but where can we get the slot in the ending handler?
One option is to save the slot in the SocketAsyncEventArgs.UserToken property when creating an object, for this there is a method in the pool to override HoldSlotInObject.
Override for our example
protected override void HoldSlotInObject(SocketAsyncEventArgs @object, PoolSlot slot)
{
	@object.UserToken = slot;
}
/* ...где-то в коде... */
pool.Release(args.UserToken as PoolSlot);

Of course, not every object provides the user with such a property. And if your class is still not protected from inheritance, then a special IPoolSlotHolder interface with one single property for storing the slot is offered. And if I know that my object is guaranteed to contain a slot, it would be logical to add TakeObject / Release methods that return / receive the objects themselves (and get their slot inside), which was done in the pool’s descendant.
Simplified implementation of the improved pool (for objects that implement IPoolSlotHolder
public abstract class PoolEx : Pool
	where T : IPoolSlotHolder
{
	public T TakeObject() { ... }
	public void Release(T @object) { ... }
	protected sealed void HoldSlotInObject(T @object, PoolSlot slot) { ... } // уже ничего переопределять не надо
}

Next, I propose to get acquainted with the development of the internal "kitchen".

Storage


For storage of objects "in a pool" the ConcurrentStack collection is used. The possible use of several pool instances required keeping records of which of the objects was created by this pool.
Thus, a “registry” was introduced based on ConcurrentDictionary, which contains the IDs of the slots ever created by the pool and the object availability flag (true - “in the pool”, false - “not in the pool”).
This allowed to kill 2 birds with one stone at once: to prevent the erroneous multiple return of the same object (because the stack does not ensure the uniqueness of the objects stored in it) and to prevent the return of objects created in another pool. This approach was a temporary solution, and then I got rid of it.

Multithreading


The classic implementation of the pool involves the use of a semaphore (in .NET it is Semaphore and SemaphoreSlim) to track the number of objects or other synchronization primitives in conjunction with a counter, but ConcurrentStack, like ConcurrentDictionary, are thread safe collections, so I don’t need to adjust the input / output of objects . I only note that calling the ConcurrentStack.Count property causes a complete enumeration of all elements, which takes a considerable amount of time, so it was decided to add my own element counter. As a result, two “atomic” operations on the pool were obtained - Push and TryPop, on the basis of which all the others were built.
The implementation of simple operations
private void Push(PoolSlot item)
{
	_registry[token.Id] = true; // реестр: объект "в пуле"
	_storage.Push(item); // возвращаем объект в хранилище
	Interlocked.Increment(ref _currentCount);
}
private bool TryPop(out PoolSlot item)
{
	if (_storage.TryPop(out item)) // пытаемся взять объект из хранилища
	{
		Interlocked.Decrement(ref _currentCount);
		_registry[token.Id] = false; // реестр: объект "не в пуле"
		return true;
	}
	item = default(PoolSlot);
	return false;
}

In addition to the input-output of existing objects, it is necessary to synchronize and allocate new ones to the specified upper limit.
Here we apply a semaphore initialized with the maximum number of elements in the pool (upper limit) and subtracting by one each time a new object is created, but the problem is that when it reaches zero it simply blocks the flow. A way out of this situation could be to call the SemaphoreSlim.Wait (0) method, which, with the current value of the semaphore “0”, returns false almost without delay, but it was decided to write a lightweight analogue of this functionality. This is how the LockFreeSemaphore class appeared, which, when it reaches zero without delay, returns false. For internal synchronization, it uses the fast CAS operations Interlocked.CompareExchange .
An example of using a CAS operation in a semaphore
public bool TryTake() // возвращает true, если успешно вошли в семафор, иначе false (если все ресурсы заняты)
{
	int oldValue, newValue;
	do
	{
		oldValue = _currentCount; // запоминаем старое значение
		newValue = oldValue - 1; // вычисляем новое значение
		if (newValue < 0) return false; // если семафор уже равен 0 - возвращаем false без ожиданий
	} while (Interlocked.CompareExchange(ref _currentCount, newValue, oldValue) != oldValue); // если старое значение не было изменено другим потоком, то оно заменяется новым и цикл успешно завершается
	return true;
}

Thus, the pool operation “take an object” works according to the following algorithm:
  1. We try to take the object from the storage, if not there - point 2.
  2. We are trying to create a new object if the semaphore is zero (the upper limit is reached) - point 3.
  3. The worst case scenario is that we expect the object to return to a victorious end.


First results, optimization and refactoring


Do you really need a pool of objects? Depends on the situation. Here are the results of a little testing using a “typical server object”, SocketAsyncEventArgs with a buffer of 1024 bytes (time in seconds, pooling enabled):
Requests for a new propertySingle thread, no poolSingle thread, with pool25 tasks *, without pool25 tasks *, with pool
1,0000.0020.0030.0270.009
10,0000.0100.0010.2720.039
25,0000.0300.0030.6090.189
50,0000.0480.0061.2850.287
1,000,0000.9590.12527.9658.345

* The task is the System.Threading.Tasks.Task class from the TPL library, starting with .NET 4.0
Results of the passage of the VS2012 profiler in a multi-threaded test with a pool:


As you can see, everything depends on the ConcurrentStack.TryPop method, which (we assume) has nowhere to accelerate. In second place is the appeal to the “registry”, which selects approximately 14% in both operations.
In principle, the support of the second collection inside the pool seemed to be a crutch anyway, so the sign “in the pool / not in the pool” was transferred to the slot itself, and the registry was safely deleted. Test results after refactoring (growth, as expected, 30-40%):
Requests for a new property25 tasks, with a pool
25,0000.098
1,000,0005.751

I think you can stop here.

Conclusion


Briefly recall how the tasks were solved:
  1. Type safety at the compilation stage - the use of generic classes.
  2. The work of the pool with any classes is the use of a generic shell without inheritance.
  3. The ease of use is the using construct (implementation by the shell of the IDisposable interface).
  4. Auto-selection of new objects is the abstract Pool.ObjectConstructor method, in which the object is initialized as you like.
  5. Limiting the number of objects is a lightweight version of the semaphore.
  6. Auto-cleanup of an object upon its return is a virtual Pool.CleanUp method, which is automatically called by the pool upon return.
  7. Thread safety - using a collection of ConcurrentStack and CAS operations (methods of the Interlocked class).
  8. Support for multiple pool instances - The Pool class is not static, not singleton, and provides validity checks for operations.


Source code with unit tests and a test application: Github
If interested, I can continue the article by implementing asynchronous TCP and UDP socket servers, for which this pool was just written.

Also popular now: