Memoization of default kwarg in Python

Published on October 14, 2018

Memoization of default kwarg in Python

    This is how you can memorize the Python function:

    def memo_square(a, cache={}): 
        if a not in cache: 
            cache[a] = a*a 
        return cache[a]

    The reception is undeservedly little known, so under the cut we will analyze how it works and what it is for.

    First, how and why it works. memo_square(like any other function) is an object of the function class, which, among other attributes, has a tuple to be filled in when the object is created memo_square.__defaults__. First, it contains an empty dictionary, as indicated in the function header:

    >>> memo_square.__defaults__ 
    ({},) 

    __defaults__- a normal tuple and its elements cannot be changed. You can, however, replace the entire set of default values ​​at once, but only on a different tuple:

    >>> def test(a=1, b=2): 
    ...  print(a, b) 
    ... 
    >>> test.__defaults__ 
    (1, 2) 
    >>> test() 
    1 2 
    >>> test.__defaults__ = ('Привет, ', 'Хабр') 
    >>> test() 
    Привет,  Хабр 
    >>> test.__defaults__[1] = 'Пикабу' 
    Traceback (most recent call last): 
      File "<stdin>", line 1, in <module> 
    TypeError: 'tuple' object does not support item assignment 
    >>> test.__defaults__ = {0: 'Привет, ', 1: 'Пикабу'} 
    Traceback (most recent call last): 
      File "<stdin>", line 1, in <module> 
    TypeError: __defaults__ must be set to a tuple object

    Sorian, this article will not fall on Picaba. Well, okay, this is not important. It is important that, with the exception of a very tricky code, it func.__defaults__is created once during the work of the program along with all its elements. The tuple and its elements will not be recreated with each function call, they will be used as long as the function exists. But to change, if the elements themselves are mutable, nobody forbids them. The inability to work with such elements is one of the most common ways to shoot yourself in the foot in python . But in general, storing values ​​between function calls can be quite useful. After several calls it memo_square.__defaults__will look like this:

    >>> memo_square(2)
    4
    >>> memo_square.__defaults__
    ({2: 4},)
    >>> memo_square(5)
    25
    >>> memo_square.__defaults__
    ({2: 4, 5: 25},)
    >>> memo_square(2)
    4
    >>> memo_square.__defaults__
    ({2: 4, 5: 25},)

    If the function has already been called for the same value, the calculation of the value and, accordingly, the cache replenishment does not occur. For a square, the benefit is small (strictly speaking, for a square, the benefit is negative, because searching the dictionary is more expensive than multiplying two numbers), but for real expensive functions, memoization / caching can be useful. Of course, it can be provided in python in more than one way. Here are the alternatives we have:

    • @ functools.lru_cache. The decorator from the functools module, which remembers the last function calls. Reliably and simply, but uses all the parameters of a function as keys, which means it requires their hashing and cannot notice that two formally different values ​​of the parameter are equivalent. With the first requirement, everything is clear, for example, you can forget about functions from sets. Well, or when you call to convert them to frozenset. As for the second - I, for example, have a function that accepts a SQL-database connection and a number, and performs some manipulations with the data associated with this number. The connection can be completely broken and re-established during the program operation, and the lru_cache cache will then fly off. But he knows how to cache only a limited number of calls (avoiding memory leaks) and is well documented.
    • Cache out function:

      def square(a):
          return a**a
      cache = {}
      for x in values:
          if x not in cache:
              cache[x] = x**x
      	    print cache[x]

      The meaning is the same, but much more cumbersome. In addition, the cache variable is visible outside the function, although it is not used for anything other than its memoization. The cache in case of memorization with the default argument is accessible from the outside only through func.__defaults__which it is rather difficult to access by mistake.
    • To cut down a full-fledged object with a cache and make the function its method. It is good in terms of architecture and testability, it allows you to maintain arbitrarily complex caching logic, but even more cumbersome due to the boilerplate in the object code. Moreover, it is not clear what from what to inherit and whether to inherit from anything at all, if the functions to be memorized are more than one.

    The main thing that loses this method of memoization is that it is not very idiomatic. Personally, when I stumbled upon this decision for the first time, I thought for a couple of minutes about what is happening here and why. On the other hand, in these few minutes I began to understand a little better how the Python functions and their arguments are arranged. So even if you do not use default arguments (for memoization or, for example, speeding up the resolution of names ), knowledge of this technique is still useful for any player.