Should strings in Python be iterable?

    And Guido created strings in the image of C, in the image of arrays of characters created them. And Guido saw that it was good. Or not?

    Imagine that you are writing completely idiomatic code to bypass some data with nesting. Beautiful is better than ugly, simple is better than complex, so you stop at the following version of the code:

    from collections.abc import Iterable
    def traverse(list_or_value, callback):
        if isinstance(list_or_value, Iterable):
            for item in list_or_value:
                traverse(item, callback)
        else:
            callback(list_or_value)
    

    You write a unit test, and what would you think? It does not work, and not just does not work, but

    >>> traverse({"status": "ok"}, print)
    Traceback (most recent call last):
      File "", line 1, in 
      File "", line 4, in traverse
      File "", line 4, in traverse
      File "", line 4, in traverse
      [Previous line repeated 989 more times]
      File "", line 2, in traverse
      File "/usr/local/opt/python/libexec/bin/../../Frameworks/Python.framework/Versions/3.7/lib/python3.7/abc.py", line 139, in __instancecheck__
        return _abc_instancecheck(cls, instance)
    RecursionError: maximum recursion depth exceeded in comparison
    

    How? Why? In search of an answer, you will plunge into the wonderful world of collections of infinite depth.

    In fact, a string is the only inline Iterablethat always returns Iterableas an element! We can, of course, construct another example by creating a list and adding it to ourselves once or twice, but do you often see this in your code? And the line is an Iterableinfinite depth, sneaking under the cover of night right into your production.

    Another example. Somewhere in the code, you needed to repeatedly check for the presence of elements in containers. You decide to write a helper that speeds it up in many ways. You are writing a universal solution using only a method __contains__(the only method in an abstract base classContainer), but then you decide to add super-optimization for a special case - a collection. After all, you can simply walk along it and make it up set!

    import functools
    from typing import Collection, Container
    def faster_container(c: Container) -> Container:
        if isinstance(c, Collection):
            return set(c)
        return CachedContainer(c)
    class CachedContainer(object):
        def __init__(self, c: Container):
            self._contains = functools.lru_cache()(c.__contains__)
        def __contains__(self, stuff):
            return self._contains(stuff)
    

    III ... your solution does not work! Here you go! Again!

    >>> c = faster_container(othello_text)
    >>> "Have you pray'd to-night, Desdemona?" in c
    False
    

    (But the wrong answer was issued really quickly ...)

    Why? Because a string in Python is an amazing collection in which the semantics of a method are __contains__not consistent with the semantics __iter__and __len__.

    In fact, a string is a collection:

    >>> from collections.abc import Collection
    >>> issubclass(str, Collection)
    True
    

    But the collection ... what? __iter__and __len__consider this a collection of characters:

    >>> s = "foo"
    >>> len(s)
    3
    >>> list(s)
    ['f', 'o', 'o']
    

    But __contains__thinks this is a collection of substrings!

    >>> "oo" in s
    True
    >>> "oo" in list(s)
    False
    

    What can be done?


    Although the behavior str.__contains__may seem strange in the context of implementations by __contains__other standard types, this behavior is one of many small things that make Python as convenient as a scripting language; allowing you to write fast and literary code on it. I would not suggest changing the behavior of this method, especially since we almost never use it to check for the presence of a single character in a string.

    And by the way, do you know why? Because we almost never use a string as a collection of characters in a scripting language! Manipulating specific characters in a string, access by index - most often the destiny of tasks in interviews. So, maybe it’s worth removing from the string __iter__, hiding it behind some kind of method like .chars()? This would solve both of these problems.

    Time for Friday discussion in the comments!

    Only registered users can participate in the survey. Please come in.

    Lines in Python ...

    • 71.8% are good, as is 74
    • 18.4% should be modified by removing __iter__ 19
    • 9.7% should be modified otherwise 10

    Also popular now: