Should strings in Python be iterable?
And Guido created strings in the image of C, in the image of arrays of characters created them. And Guido saw that it was good. Or not?
Imagine that you are writing completely idiomatic code to bypass some data with nesting. Beautiful is better than ugly, simple is better than complex, so you stop at the following version of the code:
You write a unit test, and what would you think? It does not work, and not just does not work, but
How? Why? In search of an answer, you will plunge into the wonderful world of collections of infinite depth.
In fact, a string is the only inline
Another example. Somewhere in the code, you needed to repeatedly check for the presence of elements in containers. You decide to write a helper that speeds it up in many ways. You are writing a universal solution using only a method
III ... your solution does not work! Here you go! Again!
(But the wrong answer was issued really quickly ...)
Why? Because a string in Python is an amazing collection in which the semantics of a method are
In fact, a string is a collection:
But the collection ... what?
But
Although the behavior
And by the way, do you know why? Because we almost never use a string as a collection of characters in a scripting language! Manipulating specific characters in a string, access by index - most often the destiny of tasks in interviews. So, maybe it’s worth removing from the string
Time for Friday discussion in the comments!
Imagine that you are writing completely idiomatic code to bypass some data with nesting. Beautiful is better than ugly, simple is better than complex, so you stop at the following version of the code:
from collections.abc import Iterable
def traverse(list_or_value, callback):
if isinstance(list_or_value, Iterable):
for item in list_or_value:
traverse(item, callback)
else:
callback(list_or_value)
You write a unit test, and what would you think? It does not work, and not just does not work, but
>>> traverse({"status": "ok"}, print)
Traceback (most recent call last):
File "", line 1, in
File "", line 4, in traverse
File "", line 4, in traverse
File "", line 4, in traverse
[Previous line repeated 989 more times]
File "", line 2, in traverse
File "/usr/local/opt/python/libexec/bin/../../Frameworks/Python.framework/Versions/3.7/lib/python3.7/abc.py", line 139, in __instancecheck__
return _abc_instancecheck(cls, instance)
RecursionError: maximum recursion depth exceeded in comparison
How? Why? In search of an answer, you will plunge into the wonderful world of collections of infinite depth.
In fact, a string is the only inline
Iterable
that always returns Iterable
as an element! We can, of course, construct another example by creating a list and adding it to ourselves once or twice, but do you often see this in your code? And the line is an Iterable
infinite depth, sneaking under the cover of night right into your production. Another example. Somewhere in the code, you needed to repeatedly check for the presence of elements in containers. You decide to write a helper that speeds it up in many ways. You are writing a universal solution using only a method
__contains__
(the only method in an abstract base classContainer
), but then you decide to add super-optimization for a special case - a collection. After all, you can simply walk along it and make it up set
!import functools
from typing import Collection, Container
def faster_container(c: Container) -> Container:
if isinstance(c, Collection):
return set(c)
return CachedContainer(c)
class CachedContainer(object):
def __init__(self, c: Container):
self._contains = functools.lru_cache()(c.__contains__)
def __contains__(self, stuff):
return self._contains(stuff)
III ... your solution does not work! Here you go! Again!
>>> c = faster_container(othello_text)
>>> "Have you pray'd to-night, Desdemona?" in c
False
(But the wrong answer was issued really quickly ...)
Why? Because a string in Python is an amazing collection in which the semantics of a method are
__contains__
not consistent with the semantics __iter__
and __len__
. In fact, a string is a collection:
>>> from collections.abc import Collection
>>> issubclass(str, Collection)
True
But the collection ... what?
__iter__
and __len__
consider this a collection of characters:>>> s = "foo"
>>> len(s)
3
>>> list(s)
['f', 'o', 'o']
But
__contains__
thinks this is a collection of substrings!>>> "oo" in s
True
>>> "oo" in list(s)
False
What can be done?
Although the behavior
str.__contains__
may seem strange in the context of implementations by __contains__
other standard types, this behavior is one of many small things that make Python as convenient as a scripting language; allowing you to write fast and literary code on it. I would not suggest changing the behavior of this method, especially since we almost never use it to check for the presence of a single character in a string. And by the way, do you know why? Because we almost never use a string as a collection of characters in a scripting language! Manipulating specific characters in a string, access by index - most often the destiny of tasks in interviews. So, maybe it’s worth removing from the string
__iter__
, hiding it behind some kind of method like .chars()
? This would solve both of these problems.Time for Friday discussion in the comments!
Only registered users can participate in the survey. Please come in.
Lines in Python ...
- 71.8% are good, as is 74
- 18.4% should be modified by removing __iter__ 19
- 9.7% should be modified otherwise 10