Introduction to Data classes

One of the new features introduced in Python 3.7 is the data classes (Data classes). They are designed to automate the generation of code classes that are used to store data. Despite the fact that they use other mechanisms of work, they can be compared with "mutable named tuples with default values".

Introduction

All of these examples require Python 3.7 or higher for their work.

Most python developers have to write such classes regularly:

classRegularBook:def__init__(self, title, author):
        self.title = title
        self.author = author

Already in this example is visible redundancy. The title and author identifiers are used several times. The real class will also contain overridden methods __eq__and __repr__.

The module dataclassescontains a decorator @dataclass. Using it, the similar code will look like this:

from dataclasses import dataclass
@dataclassclassBook:
    title: str
    author: str

It is important to note that type annotations are required . All fields that do not have type marks will be ignored. Of course, if you do not want to use a specific type, you can specify Anyfrom a module typing.

What do you get as a result? You automatically get the class to implement the methods __init__, __repr__, __str__, and __eq__. In addition, it will be a regular class and you can inherit from it or add arbitrary methods.

>>> book = Book(title="Fahrenheit 451", author="Bradbury")
>>> book
Book(title='Fahrenheit 451', author='Bradbury')
>>> book.author
'Bradbury'>>> other =  Book("Fahrenheit 451", "Bradbury")
>>> book == other
True

Alternatives

Tuple or dictionary

Of course, if the structure is fairly simple, you can save the data to a dictionary or a tuple:

book = ("Fahrenheit 451", "Bradbury")
other = {'title': 'Fahrenheit 451', 'author': 'Bradbury'}

However, this approach has disadvantages:

It must be remembered that the variable contains data related to this structure.
In the case of a dictionary, you must keep track of the key names. Such initialization of the dictionary {'name': 'Fahrenheit 451', 'author': 'Bradbury'}will also be formally correct.
In the case of a tuple, you must follow the order of the values, since they do not have names.

There is a better option:

Namedtuple

from collections import namedtuple 
NamedTupleBook = namedtuple("NamedTupleBook", ["title", "author"])

If we use the class created in this way, we will actually get the same thing as using the data class.

>>> book = NamedTupleBook("Fahrenheit 451", "Bradbury")
>>> book.author
'Bradbury'>>> book
NamedTupleBook(title='Fahrenheit 451', author='Bradbury')
>>> book == NamedTupleBook("Fahrenheit 451", "Bradbury"))
True

But despite the general similarity, named tuples have their limitations. They come from the fact that named tuples are still tuples.

First, you can still compare instances of different classes.

>>> Car = namedtuple("Car", ["model", "owner"])
>>> book = NamedTupleBook("Fahrenheit 451", "Bradbury"))
>>> book == Car("Fahrenheit 451", "Bradbury")
True

Secondly, named tuples are immutable. In some situations, this is useful, but I would like more flexibility.
Finally, you can operate on a named tuple just like a normal one. For example, iterate.

Other projects

If not limited to the standard library, you can find other solutions to this problem. In particular, the project attrs . It can even more than dataclass and works on older versions of python such as 2.7 and 3.4. Nevertheless, the fact that it is not part of the standard library may be inconvenient.

Creature

To create a data class, you can use the decorator @dataclass. In this case, all class fields defined with type annotation will be used in the corresponding methods of the resulting class.

Alternatively, there is a function make_dataclassthat works similarly to creating named tuples.

from dataclasses import make_dataclass
Book = make_dataclass("Book", ["title", "author"])
book = Book("Fahrenheit 451", "Bradbury")

Default values

One of the useful features is the ease of adding default values to fields. You still do not need to redefine the method __init__, just enter the values directly in the class.

@dataclassclassBook:
    title: str = "Unknown"
    author: str = "Unknown author"

They will be taken into account in the generated method. __init__

>>> Book()
Book(title='Unknown', author='Unknown author')
>>> Book("Farenheit 451")
Book(title='Farenheit 451', author='Unknown author')

But as is the case with regular classes and methods, you need to be careful with using variable defaults. If you, for example, need to use a list as there are default values, there is another way, but more on that below.

In addition, it is important to follow the order of defining the fields that have default values, since it exactly corresponds to their order in the method __init__

Immunity Data Classes

Named tuple instances are immutable. In many situations, this is a good idea. For data classes, you can also do this. Just specify a parameter frozen=Truewhen creating a class and if you try to change its fields, an exception will be thrown.FrozenInstanceError

@dataclass(frozen=True)classBook:
    title: str
    author: str

>>> book = Book("Fahrenheit 451", "Bradbury")
>>> book.title = "1984"
dataclasses.FrozenInstanceError: cannot assign to field 'title'

Configure data class

In addition to the parameter frozen, the decorator @dataclasshas other parameters:

init: if it is equal True(by default), a method is generated __init__. If the class has a method already defined __init__, the parameter is ignored.
repr: Enables (by default) method creation __repr__. The generated string contains the class name and the name and representation of all fields defined in the class. You can also exclude individual fields (see below).
eq: Enables (by default) method creation __eq__. The objects are compared in the same way as if they were tuples containing the corresponding field values. Additionally, the type matching is checked.
orderIt includes (off by default) creating methods __lt__, __le__, __gt__and __ge__. The objects are compared in the same way as the corresponding tuples of field values. It also checks the type of objects. If orderset and eqnot, an exception will be thrown ValueError. Also, the class should not contain already defined comparison methods.
unsafe_hashaffects method generation __hash__. The behavior also depends on the values of the parameters eqandfrozen

Customization of individual fields

In most standard situations, this is not required, but it is possible to customize the behavior of a data class down to individual fields using the field function.

Variable Defaults

A typical situation described above is the use of lists or other mutable defaults. You may want a bookshelf class containing a list of books. If you run the following code:

@dataclassclassBookshelf:
    books: List[Book] = []

The interpreter will report an error:

ValueError: mutable default <class'list'> for field books isnot allowed: use default_factory

However, for other variable values, this warning will not work and will lead to incorrect program behavior.

To avoid problems, it is proposed to use the default_factoryfunction parameter field. As its value can be any called object or function without parameters.
The correct version of the class looks like this:

@dataclassclassBookshelf:
    books: List[Book] = field(default_factory=list)

Other options

In addition to the specified default_factoryfunction, the field has the following parameters:

default: default value. This parameter is required because the call fieldreplaces the setting of the default field value.
init: enables (set by default) use of the field in the method __init__
repr: enables (set by default) use of the field in the method __repr__
compareincludes (default) the use of the field in comparison methods ( __eq__, __le__and others)
hash: may be a boolean value or None. If it is equal True, the field is used to calculate the hash. If specified None(default), the parameter value is used compare.
One of the reasons to specify hash=Falsefor a given compare=Truemay be the difficulty of calculating the field hash, despite the fact that it is necessary for comparison.
metadata: arbitrary dictionary or None. The value is wrapped in MappingProxyTypeto make it unchangeable. This parameter is not used by the data classes themselves and is intended for the operation of third-party extensions.

Processing after initialization

The auto-generated method __init__calls the method __post_init__if it is defined in the class. As a rule, it is called in a form self.__post_init__(), but if type variables are defined in the class InitVar, they will be passed as method parameters.

If the method has __init__not been generated, it __post_init__will not be called.

For example, add a generated book description.

@dataclassclassBook:
    title: str
    author: str
    desc: str = Nonedef__post_init__(self):
        self.desc = self.desc or"`%s` by %s" % (self.title, self.author)

>>>  Book("Fareneheit 481", "Bradbury")
Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')

Parameters for initialization only

One of the possibilities associated with the method __post_init__is the parameters used only for initialization. If during the declaration of the field to specify as its type InitVar, its value will be passed as a parameter of the method __post_init__. In no other way, such fields are not used in the data class.

@dataclassclassBook:
    title: str
    author: str
    gen_desc: InitVar[bool] = True
    desc: str = Nonedef__post_init__(self, gen_desc: str):if gen_desc and self.desc isNone:
            self.desc = "`%s` by %s" % (self.title, self.author)

>>> Book("Fareneheit 481", "Bradbury")
Book(title='Fareneheit 481', author='Bradbury', desc='`Fareneheit 481` by Bradbury')
>>> Book("Fareneheit 481", "Bradbury", gen_desc=False)
Book(title='Fareneheit 481', author='Bradbury', desc=None)

Inheritance

When you use a decorator @dataclass, it goes through all the parent classes starting with object and for each data class found stores the fields in an ordered dictionary (ordered mapping), then adding the properties of the class being processed. All generated methods use fields from the resulting ordered dictionary.

As a result, if the parent class defines default values, you will need to define the fields with default values.

Since an ordered dictionary stores values in the order of insertion, for the following classes

@dataclassclassBaseBook:
    title: Any = None
    author: str = None@dataclassclassBook(BaseBook):
    desc: str = None
    title: str = "Unknown"

a __init__method with the following signature will be generated :

def __init__(self, title: str="Unknown", author: str=None, desc: str=None)

Tags: