The book "Pure Python. Subtleties programming for pros

Hi, Habrozhiteli! Learning all the features of Python is a difficult task, and with this book you can focus on the practical skills that are really important. Dig out the “hidden gold” in the standard Python library and start writing clean code today.

If you have experience with old versions of Python, you can speed up the work with modern templates and functions presented in Python 3.

If you have worked with other programming languages and want to switch to Python, you will find practical tips you need to become effective pythonist.
If you want to learn how to write clean code, you will find here the most interesting examples and little-known stunts.

Fragment "The Craziest Dictionary Expression in the West"

Sometimes you come across a tiny example of code that has a truly unexpected depth - a single line of code that can learn a lot if you think about it well. Such a piece of code is like a koan in Zen Buddhism: a question or statement used in Zen practice to raise doubts and verify a student’s achievements.

The tiny snippet of code that we discuss in this section is one such example. At first glance, it may look like a straightforward dictionary expression, but upon closer inspection it sends you on a mind-expanding psychedelic cruise on the Python interpreter.

From this one-liner I get such a buzz that once I even typed it on my Python conference participant icon as a reason to talk. This led to several constructive dialogues with the members of my Python e-mail list.
So without further ado, here is this code snippet. Take a pause to reflect on the following dictionary expression and what its calculation should lead to:

>>> {True: 'да', 1: 'нет', 1.0: 'возможно'}

I'll wait here ...

Ok, ready?

The following is the result we will get when evaluating the above dictionary expression in a Python interpreter session:

>>> {True: 'да', 1: 'нет', 1.0: 'возможно'}
{True: 'возможно'}

I admit, when I saw this result for the first time, I was very taken aback. But everything will fall into place when you conduct a leisurely step-by-step study of what is happening here. Let's ponder why we get this, I must say, very non-intuitive result.

When Python processes our dictionary expression, it first builds a new empty dictionary object, and then assigns keys and values to it in the order in which they are passed to the dictionary expression.

Then, when we decompose it into parts, our dictionary expression will be equivalent to the sequence of instructions below, which are executed in order:

>>> xs = dict()
>>> xs[True] = 'да'>>> xs[1] = 'нет'>>> xs[1.0] = 'возможно'

Oddly enough, Python considers all keys used in this dictionary example to be equivalent:

>>> True == 1 == 1.0True

All right, but wait a minute. I'm sure you can intuitively recognize that 1.0 == 1, but why is True also considered equivalent and 1? The first time I saw this dictionary expression, it really puzzled me.

After a little digging in the Python documentation, I learned that Python treats the bool type as a subclass of type int. This is the case in Python 2 and Python 3:

A Boolean type is a subtype of an integer type, and Boolean values behave, respectively, as values 0 and 1 in almost all contexts, with the exception that when converting to string type, respectively, the string values are returned 'False' or 'True '.

And of course, this means that in Python, boolean values can technically be used as list or tuple indexes:

>>> ['нет', 'да'][True]
'да'

But you probably should not use this kind of logical variables in the name of clarity (and your colleagues' mental health).

Anyway, back to our dictionary expression.

As for the Python language, all these values — True, 1, and 1.0 — represent the same dictionary key. When the interpreter computes a dictionary expression, it repeatedly rewrites the value of the key True. This explains why, at the very end, the resulting dictionary contains only one key.

Before we go further, let's take another look at the source dictionary expression:

>>> {True: 'да', 1: 'нет', 1.0: 'возможно'}
{True: 'возможно'}

Why do we still get True as the key here? Shouldn't the key be changed to 1.0 at the very end due to repeated assignments at the very end?

After some research into the Python interpreter source code, I found out that when a new value is associated with a key object, the Python dictionaries themselves do not update this key object:

>>> ys = {1.0: 'нет'}
>>> ys[True] = 'да'>>> ys
{1.0: 'да'}

Of course, it makes sense as a performance optimization: if the keys are considered identical, then why waste time updating the original?
In the last example, you saw that the original True object was never replaced as a key. For this reason, the string representation of the dictionary still prints the key as True (instead of 1 or 1.0).

With what we know now, it seems that the values in the resulting dictionary are rewritten only because the comparison will always show them as equivalent to each other. However, it turns out that this effect is not a consequence of the equivalence test using the __eq__ method, either.

Python dictionaries are based on a hash table data structure. When I first saw this amazing dictionary expression, my first thought was that this behavior was somehow related to hash conflicts.

The fact is that the hash table in the internal representation stores the keys it contains in various “baskets” in accordance with the hash value of each key. The hash value is derived from the key as a fixed-length numeric value that uniquely identifies the key.

This fact allows you to perform quick search operations. It is much faster to find the numeric key hash value in the lookup table than to compare the full key object with all other keys and perform an equivalence check.

However, methods for calculating hash values are usually not perfect. And ultimately, two or more keys that are actually different will have the same derived hash value, and they will end up in the same basket of the search table.
When two keys have the same hash value, this situation is called a hash conflict and is a special case that the insertion and finding algorithms of the elements in the hash table must deal with.

Based on this assessment, it is highly likely that hashing is somehow related to the unexpected result we got from our dictionary expression. So let's find out if the hash values of the keys here also have a specific role.
I define the class below as a small detective tool:

classAlwaysEquals:def__eq__(self, other):
return True
def__hash__(self):
return id(self)

This class is characterized by two aspects.

First, since the daner __eq__ method always returns True, all instances of this class pretend that they are equivalent to any object:

>>> AlwaysEquals() == AlwaysEquals()
True
>>> AlwaysEquals() == 42
True
>>> AlwaysEquals() == 'штаа?'
True

And secondly, each instance of AlwaysEquals will also return a unique hash value generated by the built-in id () function:

>>> objects = [AlwaysEquals(),
                        AlwaysEquals(),
                        AlwaysEquals()]
>>> [hash(obj) for obj in objects]
[4574298968, 4574287912, 4574287072]

In Python, the id () function returns the address of an object in RAM, which is guaranteed to be unique.

With this class, you can now create objects that pretend that they are equivalent to any other object, but with them will be associated with a unique hash value. This will make it possible to check whether the dictionary keys are being rewritten, relying only on the result of their comparison for equivalence.

And, as you can see, the keys in the following example are not overwritten, although the comparison will always show them as equivalent to each other:

>>> {AlwaysEquals(): 'да', AlwaysEquals(): 'нет'}
{ <AlwaysEquals object at 0x110a3c588>: 'да',
   <AlwaysEquals object at 0x110a3cf98>: 'нет' }

We can also look at this idea from the other side and check whether returning the same hash value will be a sufficient reason to force the keys to be rewritten:

classSameHash:def__hash__(self):
            return1

Comparing instances of the SameHash class will show them as not equivalent to each other, but they will all have the same hash value of 1:

>>> a = SameHash()
>>> b = SameHash()
>>> a == b
False
>>> hash(a), hash(b)
(1, 1)

Let's see how the Python dictionaries react when we try to use the SameHash class as the keys of the dictionary:

>>> {a: 'a', b: 'b'}
{ <SameHash instance at 0x7f7159020cb0>: 'a',
  <SameHash instance at 0x7f7159020cf8>: 'b' }

As this example shows, the “keys are overwritten” effect is not caused by hash value conflicts alone.

Dictionaries perform an equivalence test and compare the hash value to determine if the two keys are the same. Let's try to summarize the results of our research.

The dictionary expression {True: 'yes', 1: 'no', 1.0: 'maybe'} is calculated as {true: 'possible'}, because the comparison of all keys of this example, True, 1, and 1.0, will show them as equivalent to each other, and they all have the same hash value:

>>> True == 1 == 1.0
True
>>> (hash(True), hash(1), hash(1.0))
(1, 1, 1)

Perhaps now it is not so surprising that we received just such a result as the final state of the dictionary:

>>> {True: 'да', 1: 'нет', 1.0: 'возможно'}
{True: 'возможно'}

Here we touched on a lot of topics, and this particular Python trick at first may not fit in my head - which is why, at the very beginning of this section, I compared it with a koan in Zen.

If you have difficulty understanding what is happening in this section, try experimenting in turn with all the code examples in the Python interpreter session. You will be rewarded by expanding your knowledge of the internal mechanisms of the Python language.

»More information about the book is available on the publisher's website
» Table of contents
» Excerpt

For Habrozhiteley 20% discount coupon - Python

Tags:

books

The book "Pure Python. Subtleties programming for pros

Fragment "The Craziest Dictionary Expression in the West"

Also popular now: