Introduction to Data classes
One of the new features introduced in Python 3.7 is the data classes (Data classes). They are designed to automate the generation of code classes that are used to store data. Despite the fact that they use other mechanisms of work, they can be compared with "mutable named tuples with default values".
Introduction
All of these examples require Python 3.7 or higher for their work.
Most python developers have to write such classes regularly:
classRegularBook:def__init__(self, title, author):
self.title = title
self.author = author
Already in this example is visible redundancy. The title and author identifiers are used several times. The real class will also contain overridden methods __eq__
and __repr__
.
The module dataclasses
contains a decorator @dataclass
. Using it, the similar code will look like this:
from dataclasses import dataclass
@dataclassclassBook:
title: str
author: str
It is important to note that type annotations are required . All fields that do not have type marks will be ignored. Of course, if you do not want to use a specific type, you can specify Any
from a module typing
.
What do you get as a result? You automatically get the class to implement the methods __init__
, __repr__
, __str__
, and __eq__
. In addition, it will be a regular class and you can inherit from it or add arbitrary methods.
>>> book = Book(title="Fahrenheit 451", author="Bradbury")
>>> book
Book(title='Fahrenheit 451', author='Bradbury')
>>> book.author
'Bradbury'>>> other = Book("Fahrenheit 451", "Bradbury")
>>> book == other
True
Alternatives
Tuple or dictionary
Of course, if the structure is fairly simple, you can save the data to a dictionary or a tuple:
book = ("Fahrenheit 451", "Bradbury")
other = {'title': 'Fahrenheit 451', 'author': 'Bradbury'}
However, this approach has disadvantages:
- It must be remembered that the variable contains data related to this structure.
- In the case of a dictionary, you must keep track of the key names. Such initialization of the dictionary
{'name': 'Fahrenheit 451', 'author': 'Bradbury'}
will also be formally correct. - In the case of a tuple, you must follow the order of the values, since they do not have names.
There is a better option:
Namedtuple
from collections import namedtuple
NamedTupleBook = namedtuple("NamedTupleBook", ["title", "author"])
If we use the class created in this way, we will actually get the same thing as using the data class.
>>> book = NamedTupleBook("Fahrenheit 451", "Bradbury")
>>> book.author
'Bradbury'>>> book
NamedTupleBook(title='Fahrenheit 451', author='Bradbury')
>>> book == NamedTupleBook("Fahrenheit 451", "Bradbury"))
True
But despite the general similarity, named tuples have their limitations. They come from the fact that named tuples are still tuples.
First, you can still compare instances of different classes.
>>> Car = namedtuple("Car", ["model", "owner"])
>>> book = NamedTupleBook("Fahrenheit 451", "Bradbury"))
>>> book == Car("Fahrenheit 451", "Bradbury")
True
Secondly, named tuples are immutable. In some situations, this is useful, but I would like more flexibility.
Finally, you can operate on a named tuple just like a normal one. For example, iterate.
Other projects
If not limited to the standard library, you can find other solutions to this problem. In particular, the project attrs . It can even more than dataclass and works on older versions of python such as 2.7 and 3.4. Nevertheless, the fact that it is not part of the standard library may be inconvenient.
Creature
To create a data class, you can use the decorator @dataclass
. In this case, all class fields defined with type annotation will be used in the corresponding methods of the resulting class.
Alternatively, there is a function make_dataclass
that works similarly to creating named tuples.
from dataclasses import make_dataclass
Book = make_dataclass("Book", ["title", "author"])
book = Book("Fahrenheit 451", "Bradbury")
Default values
One of the useful features is the ease of adding default values to fields. You still do not need to redefine the method __init__
, just enter the values directly in the class.
@dataclassclassBook:
title: str = "Unknown"
author: str = "Unknown author"
They will be taken into account in the generated method. __init__
>>> Book()
Book(title='Unknown', author='Unknown author')
>>> Book("Farenheit 451")
Book(title='Farenheit 451', author='Unknown author')
But as is the case with regular classes and methods, you need to be careful with using variable defaults. If you, for example, need to use a list as there are default values, there is another way, but more on that below.
In addition, it is important to follow the order of defining the fields that have default values, since it exactly corresponds to their order in the method __init__
Immunity Data Classes
Named tuple instances are immutable. In many situations, this is a good idea. For data classes, you can also do this. Just specify a parameter frozen=True
when creating a class and if you try to change its fields, an exception will be thrown.FrozenInstanceError
@dataclass(frozen=True)classBook:
title: str
author: str
>>> book = Book("Fahrenheit 451", "Bradbury")
>>> book.title = "1984"
dataclasses.FrozenInstanceError: cannot assign to field 'title'
Configure data class
In addition to the parameter frozen
, the decorator @dataclass
has other parameters:
init
: if it is equalTrue
(by default), a method is generated__init__
. If the class has a method already defined__init__
, the parameter is ignored.repr
: Enables (by default) method creation__repr__
. The generated string contains the class name and the name and representation of all fields defined in the class. You can also exclude individual fields (see below).eq
: Enables (by default) method creation__eq__
. The objects are compared in the same way as if they were tuples containing the corresponding field values. Additionally, the type matching is checked.order
It includes (off by default) creating methods__lt__
,__le__
,__gt__
and__ge__
. The objects are compared in the same way as the corresponding tuples of field values. It also checks the type of objects. Iforder
set andeq
not, an exception will be thrownValueError
. Also, the class should not contain already defined comparison methods.unsafe_hash
affects method generation__hash__
. The behavior also depends on the values of the parameterseq
andfrozen
Customization of individual fields
In most standard situations, this is not required, but it is possible to customize the behavior of a data class down to individual fields using the field function.
Variable Defaults
A typical situation described above is the use of lists or other mutable defaults. You may want a bookshelf class containing a list of books. If you run the following code:
@dataclassclassBookshelf:
books: List[Book] = []
The interpreter will report an error:
ValueError: mutable default <class'list'> for field books isnot allowed: use default_factory
However, for other variable values, this warning will not work and will lead to incorrect program behavior.
To avoid problems, it is proposed to use the default_factory
function parameter field
. As its value can be any called object or function without parameters.
The correct version of the class looks like this:
@dataclassclassBookshelf:
books: List[Book] = field(default_factory=list)
Other options
In addition to the specified default_factory
function, the field has the following parameters:
default
: default value. This parameter is required because the callfield
replaces the setting of the default field value.init
: enables (set by default) use of the field in the method__init__
repr
: enables (set by default) use of the field in the method__repr__
compare
includes (default) the use of the field in comparison methods (__eq__
,__le__
and others)hash
: may be a boolean value orNone
. If it is equalTrue
, the field is used to calculate the hash. If specifiedNone
(default), the parameter value is usedcompare
.
One of the reasons to specifyhash=False
for a givencompare=True
may be the difficulty of calculating the field hash, despite the fact that it is necessary for comparison.metadata
: arbitrary dictionary orNone
. The value is wrapped inMappingProxyType
to make it unchangeable. This parameter is not used by the data classes themselves and is intended for the operation of third-party extensions.
Processing after initialization
The auto-generated method __init__
calls the method __post_init__
if it is defined in the class. As a rule, it is called in a form self.__post_init__()
, but if type variables are defined in the class InitVar
, they will be passed as method parameters.
If the method has __init__
not been generated, it __post_init__
will not be called.
For example, add a generated book description.
@dataclassclassBook:
title: str
author: str
desc: str = Nonedef__post_init__(self):
self.desc = self.desc or"`%s` by %s" % (self.title, self.author)
>>> Book("Fareneheit 481", "Bradbury")
Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')
Parameters for initialization only
One of the possibilities associated with the method __post_init__
is the parameters used only for initialization. If during the declaration of the field to specify as its type InitVar
, its value will be passed as a parameter of the method __post_init__
. In no other way, such fields are not used in the data class.
@dataclassclassBook:
title: str
author: str
gen_desc: InitVar[bool] = True
desc: str = Nonedef__post_init__(self, gen_desc: str):if gen_desc and self.desc isNone:
self.desc = "`%s` by %s" % (self.title, self.author)
>>> Book("Fareneheit 481", "Bradbury")
Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')
>>> Book("Fareneheit 481", "Bradbury", gen_desc=False)
Book(title='Fareneheit 481', author='Bradbury', desc=None)
Inheritance
When you use a decorator @dataclass
, it goes through all the parent classes starting with object and for each data class found stores the fields in an ordered dictionary (ordered mapping), then adding the properties of the class being processed. All generated methods use fields from the resulting ordered dictionary.
As a result, if the parent class defines default values, you will need to define the fields with default values.
Since an ordered dictionary stores values in the order of insertion, for the following classes
@dataclassclassBaseBook:
title: Any = None
author: str = None@dataclassclassBook(BaseBook):
desc: str = None
title: str = "Unknown"
a __init__
method with the following signature will be generated :
def __init__(self, title: str="Unknown", author: str=None, desc: str=None)