
New serialization may appear in Java
A new research document has appeared on the OpenJDK website that describes the idea of introducing a new improved serialization into the language to replace the old one.
Serialization in Java has existed since version 1.1, that is, almost from the moment it was born. On the one hand, serialization is a very convenient mechanism that allows you to quickly and easily make any class serializable by inheriting this class from the java.io.Serializable interface. Perhaps even this simplicity has become one of the key reasons why Java has gained such huge popularity in the world, because it allowed you to quickly and efficiently write network applications.
On the other hand, the way serialization is implemented in Java involves a huge number of problems that increase the cost of supporting applications, reduce their security, and slow down the evolution of the platform.
What is wrong with serialization in Java? We list the most serious problems:
The fundamental mistake of existing serialization in Java is that it tries to be too “invisible” to the programmer. It simply inherits from java.io.Serializable and receives some implicit magic that is executed by the virtual machine.
On the contrary, the programmer must explicitly write constructions responsible for constructing and deconstructing objects. These constructs must be at the language level and must be written through static field access, not reflection.
Another serialization error is that it is trying to do too much. She sets herself the task of being able to serialize any arbitrary graph of objects (which may contain loops) and deserialize it back without breaking its state.
This error can be corrected by simplifying the task and serializing not a graph of objects, but a data tree in which there will be no concept of identity (as in JSON).
How to make serialization that fits naturally with the object model, uses constructors for deserialization, is separated from the encoding format and supports versioning? To this end, annotations come to the rescue and the possibility of a language not yet included in Java: pattern matching . For instance:
In this example, the Range class is declared, which is ready for serialization through two special members of the class: a serializer and a deserializer marked with @Serializer and @Deserializer annotations. The serializer is implemented through the deconstructor of the pattern, and the deserializer is implemented through the static method in which the constructor is called. Thus, during deserialization, the invariant hi> = lo specified in the constructor is inevitably checked.
There is no magic in this approach, and regular annotations are used, so any framework can do serialization, and not just the Java platform itself. This means that the encoding format can also be absolutely anything (binary, XML, JSON, YAML, etc.).
Since serializers and deserializers are common methods, the programmer has great freedom in their implementation. For example, he may choose a representation of an object different from the way the object is represented in memory. For example, LinkedList can be serialized not into a chain of links, but into one continuous array, which will make the presentation simpler, more efficient, and more compact.
Versioning in this approach is implemented using the special version field of the @Serializer and @Deserializer annotations:
In this example, one of the three deserializers will be called, depending on the version.
What if we do not want serializers and deserializers available to anyone other than for serialization purposes? To do this, we can make them private. However, in this case, a specific serialization framework will not be able to access them through reflection if such code is inside the module in which the package is not open for deep reflective access. For such a case, it is proposed to introduce another new construction into the language: open class members. For instance:
Here, serializers and deserializers are marked with the open keyword, which makes them open to setAccessible.
Thus, the new approach is fundamentally different from the old: in it, classes are designed as serializable, and not given to the platform as is. This requires extra effort, but makes serialization more predictable, safer, and independent of the encoding format and serialization framework.
PS Friends, if you want to receive similar news about Java more quickly and conveniently, then subscribe to my channel in Telegram.
Serialization in Java has existed since version 1.1, that is, almost from the moment it was born. On the one hand, serialization is a very convenient mechanism that allows you to quickly and easily make any class serializable by inheriting this class from the java.io.Serializable interface. Perhaps even this simplicity has become one of the key reasons why Java has gained such huge popularity in the world, because it allowed you to quickly and efficiently write network applications.
On the other hand, the way serialization is implemented in Java involves a huge number of problems that increase the cost of supporting applications, reduce their security, and slow down the evolution of the platform.
What is wrong with serialization in Java? We list the most serious problems:
- Serialization (and deserialization) bypasses language mechanisms. It ignores field access modifiers (private, protected) and creates objects without using constructors, which means that it ignores invariants that may be present in these constructors. An attacker could exploit such a vulnerability by substituting data with invalid data, and they would be successfully swallowed during deserialization.
- When writing serializable classes, the compiler does not help in any way and does not detect errors. For example, you cannot statically guarantee that all fields of a serializable class are themselves serializable. Or you can make a typo in the names of the readObject, writeObject, readResolve, etc. methods, and then these methods simply will not be used during serialization.
- Serialization does not support the normal versioning mechanism, so it is very difficult to modify serializable classes so that they remain compatible with their old versions.
- Serialization is strongly tied to streaming encoding / decoding, which means it is very difficult to change the encoding format to a different one from the standard one. In addition, the standard format is neither compact, nor efficient, nor human-readable.
The fundamental mistake of existing serialization in Java is that it tries to be too “invisible” to the programmer. It simply inherits from java.io.Serializable and receives some implicit magic that is executed by the virtual machine.
On the contrary, the programmer must explicitly write constructions responsible for constructing and deconstructing objects. These constructs must be at the language level and must be written through static field access, not reflection.
Another serialization error is that it is trying to do too much. She sets herself the task of being able to serialize any arbitrary graph of objects (which may contain loops) and deserialize it back without breaking its state.
This error can be corrected by simplifying the task and serializing not a graph of objects, but a data tree in which there will be no concept of identity (as in JSON).
How to make serialization that fits naturally with the object model, uses constructors for deserialization, is separated from the encoding format and supports versioning? To this end, annotations come to the rescue and the possibility of a language not yet included in Java: pattern matching . For instance:
public class Range {
int lo;
int hi;
private Range(int lo, int hi) {
if (lo > hi)
throw new IllegalArgumentException(String.format("(%d,%d)",
lo, hi));
this.lo = lo;
this.hi = hi;
}
@Serializer
public pattern Range(int lo, int hi) {
lo = this.lo;
hi = this.hi;
}
@Deserializer
public static Range make(int lo, int hi) {
return new Range(lo, hi);
}
}
In this example, the Range class is declared, which is ready for serialization through two special members of the class: a serializer and a deserializer marked with @Serializer and @Deserializer annotations. The serializer is implemented through the deconstructor of the pattern, and the deserializer is implemented through the static method in which the constructor is called. Thus, during deserialization, the invariant hi> = lo specified in the constructor is inevitably checked.
There is no magic in this approach, and regular annotations are used, so any framework can do serialization, and not just the Java platform itself. This means that the encoding format can also be absolutely anything (binary, XML, JSON, YAML, etc.).
Since serializers and deserializers are common methods, the programmer has great freedom in their implementation. For example, he may choose a representation of an object different from the way the object is represented in memory. For example, LinkedList can be serialized not into a chain of links, but into one continuous array, which will make the presentation simpler, more efficient, and more compact.
Versioning in this approach is implemented using the special version field of the @Serializer and @Deserializer annotations:
class C {
int a;
int b;
int c;
@Deserializer(version = 3)
public C(int a, int b, int c) {
this a = a;
this.b = b;
this.c = c;
}
@Deserializer(version = 2)
public C(int a, int b) {
this(a, b, 0);
}
@Deserializer(version = 1)
public C(int a) {
this(a, 0, 0);
}
@Serializer(version = 3)
public pattern C(int a, int b, int c) {
a = this.a;
b = this.b;
c = this.c;
}
}
In this example, one of the three deserializers will be called, depending on the version.
What if we do not want serializers and deserializers available to anyone other than for serialization purposes? To do this, we can make them private. However, in this case, a specific serialization framework will not be able to access them through reflection if such code is inside the module in which the package is not open for deep reflective access. For such a case, it is proposed to introduce another new construction into the language: open class members. For instance:
class Foo {
private final InternalState is;
public Foo(ExternalState es) {
this(new InternalState(es));
}
@Deserializer
private open Foo(InternalState is) {
this.is = is;
}
@Serializer
private open pattern serialize(InternalState is) {
is = this.is;
}
}
Here, serializers and deserializers are marked with the open keyword, which makes them open to setAccessible.
Thus, the new approach is fundamentally different from the old: in it, classes are designed as serializable, and not given to the platform as is. This requires extra effort, but makes serialization more predictable, safer, and independent of the encoding format and serialization framework.
PS Friends, if you want to receive similar news about Java more quickly and conveniently, then subscribe to my channel in Telegram.