Siemargl July 19, 2019 at 17:52

Ownership and borrowing in D

Transfer

Almost all non-trivial programs allocate and use dynamic memory. Doing it correctly is becoming increasingly important as programs become more complex and errors are even more expensive.

Typical problems are:

memory leaks (not freeing up used up memory)
double release (memory release more than once)
use after release (use of a pointer to a memory previously freed)

The task is to track the pointers responsible for freeing memory (i.e., those who own the memory), and to distinguish pointers that simply point to a piece of memory, control where they are located, and which of them are active (in scope).

Typical solutions are as follows:

Garbage Collection (GC) - GC owns memory blocks and periodically scans them for pointers to these blocks. If no pointers are found, memory is freed. This scheme is reliable and is used in languages such as Go and Java. But GC has a tendency to use much more memory than necessary, has pauses and slows down code due to repackaging (orig.inserted write gates).
Reference Counting (RC) - An RC object owns memory and stores a counter of pointers to itself. When this counter decreases to zero, memory is freed. It is also a reliable mechanism and is accepted in languages like C ++ and ObjectiveC. RC is memory efficient, additionally requiring only space under the counter. The negative aspects of RC are the overhead of maintaining the counter, embedding an exception handler to guarantee its reduction, and blocking necessary for objects shared between program flows. To improve performance, programmers sometimes tricked by temporarily referring to an RC object bypassing the counter, creating the risk of doing it incorrectly.
Manual control - Manual memory management is Sysalny malloc and free. It is fast and efficient in terms of memory usage, but the language doesn’t help to do everything correctly, completely relying on the experience and zeal of the programmer. I have been using malloc and free for 35 years, and with the help of a bitter and endless experience I rarely make mistakes. But this is not the way that programming technology can rely on, and note that I said "rarely" and not "never."

Solutions 2 and 3 to one degree or another rely on faith in the programmer to do everything correctly. Systems based on faith do not scale well, and memory management errors are proven to be very difficult to recheck (so bad that some coding standards prohibit the use of dynamic memory).

But there is also a fourth way - Ownership and Borrowing, OB. It is memory efficient, as fast as manual operation, and is subject to automated rechecking. The method has recently been popularized by the Rust programming language. It also has its drawbacks, in particular the need to rethink the planning of algorithms and data structures.

You can deal with negative aspects and the rest of this article is a schematic description of how the OB system works and how we propose to write it into the D language. I initially considered this impossible, but having spent some time thinking it over, I found a way. It is similar to what we did with functional programming - with transitive immutability and "pure" functions.

Possession

The decision of who owns the object in memory is ridiculously simple - there is a single pointer to the object and it is the owner. He is also responsible for the release of memory, after which it becomes invalid. Due to the fact that the pointer to the object in memory is the owner, there are no other pointers inside this data structure, and therefore the data structure forms a tree.

The second consequence is that pointers use the semantics of moving rather than copying:

T* f();
void g(T*);
T* p = f();
T* q = p; // значение p перемещается в q, а не копируется
g(p);     // ошибка, p более недействительно

Removing a pointer from inside a data structure is prohibited:

struct S { T* p; }
S* f();
S* s = f();
T* q = s.p; // ошибка, недопустимо создание двух указателей на s.p

Why not just mark sp as invalid? The problem is that this will require setting the label in runtime, but should be solved at the compilation stage, because it is simply considered a compilation error.

The exit of the own pointer out of scope is also an error:

void h() {
  T* p = f();
} // ошибка, забыли освободить p?

You must move the pointer value differently:

void g(T*);
void h() {
  T* p = f();
  g(p);  // переместили в g(), это теперь проблема g()
}

This perfectly solves the problems of memory leaks and use after freeing (Hint: for clarity, replace f () with malloc (), and g () with free ().)

All this can be checked at the compilation stage using the Data Flow Analysis technique ( Data Flow Analysis (DFA) , something like it is used to remove common sub-expressions , DFA can unwind any rat tangle from program transitions that may occur.

Borrowing

The tenure system described above is reliable, but too restrictive.
Consider:

struct S { void car(); void bar(); }
struct S* f();
S* s = f();
s.car();  // s перемещен в car()
s.bar();  // ошибка, s недействителен

For this to work, s.car () must have a way to get the pointer back on exit.

This is how borrowing works. s.car () takes a copy of s for the duration of s.car (). s is invalid at runtime, and becomes valid again when s.car () exits.

In D, struct member functions get the this pointer by reference, so that we can adapt the borrowing with a small extension: getting the argument by reference takes it.

D also supports scope for pointers, so borrowing is natural:

void g(scope T*);
T* f();
T* p = f();
g(p);      // g() заимствует p
g(p);      // мы может использовать p снова по возврату из g()

(When functions receive arguments by reference or pointers with a scope are used, they are forbidden to extend beyond the boundaries of a function or scope. This corresponds to the semantics of borrowing.)

Borrowing in this way guarantees the uniqueness of a pointer to an object in memory at any given moment.

Borrowing can be expanded further with the understanding that the ownership system is also reliable, even if an object is additionally indicated by several constant pointers (but only one mutable one). A constant pointer cannot change memory or free it. This means that several constant pointers can be borrowed from the mutable owner, but he has no right to be used while these constant pointers are alive.

For instance:

T* f();
void g(T*);
T* p = f();  // p стал владельцем
{
  scope const T* q = p; // заимствованный константный указатель
  scope const T* r = p; // займем еще один
  g(p); // ошибка, p недействителен пока q и r в области видимости
}
g(p); // ok

Principles

The foregoing can be reduced to the following understanding that an object in memory behaves as if it is in one of two states:

there is exactly one mutable pointer to it
one or more additional constant pointers

An attentive reader will notice something strange in what I wrote: “as if”. What did I want to hint at? What is going on? Yes, there is one. Computer programming languages are full of such “as if” under the hood, something like the money in your bank account is actually not there (I apologize if this was a gross shock to someone), and this is no different from that. Read on!

But first, a little deeper into the topic.

Integrating Ownership / Borrowing Techniques in D

Aren't these techniques incompatible with the way people usually write in D, and won't almost all existing D programs break? And it’s not so easy to fix, but so much so that you have to redesign all the algorithms from scratch?

Yes indeed. Unless D has an (almost) secret weapon: attributes of functions. It turns out that the semantics of ownership / borrowing (OB) can be implemented for each function separately after the usual semantic analysis. An attentive reader could notice that no new syntax has been added, only restrictions have been imposed on existing code. D already has a history of using function attributes to change their semantics, for example, the pure attribute to create “pure” functions. To enable OB semantics, the @ live attribute is added .

This means that the OB can be added to the code on D gradually, as needed and free resources. This makes it possible to add OBs, and this is critical, constantly supporting the project in a fully working, tested and ready-to-release state. It also allows you to automate the process of monitoring what percentage of the project has already been transferred to the OB. This technique is added to the list of other D language guarantees regarding the reliability of working with memory (such as controlling the non-distribution of pointers to temporary variables on the stack).

As if

Some necessary things cannot be realized with strict adherence to OBs, such as reference counting objects. After all, RC objects are designed to have many pointers to them. Since RC objects are safe when working with memory (if implemented correctly), they can be used together with OBs without negatively affecting reliability. They just cannot be created using the OB technique. The solution is that there are other function attributes in D, like @ system . @ system are features where many reliability checks are disabled. Naturally, the OB will also be disabled in the code with @ system . This is where the implementation of RC technology is hiding from OB control.

But in the code with OB, RC the object looks as if it follows all the rules, so no problem!

It will take a number of similar library types to work successfully with OB.

Conclusion

This article is a basic overview of OB technology. I am working on a much more detailed specification. It is possible that I missed something and somewhere a hole below the waterline, but so far everything looks good. This is a very exciting development for D and I am looking forward to implementing it.

For further discussions and comments from Walter, refer to the topics on / r / programming subreddit and on Hacker News .

Tags:

dlang