6 ways to merge a list of lists
Here in our office a conversation entered as the most “beautiful” and to quickly glue the list of lists in Python. Really how?
Even such a seemingly trivial problem can be solved in several ways, significantly differing in speed and expressiveness.
OPTION1
Everyone knows that list items can be looped through and that it is possible to add items to the end. This leads us to the first solution:
Not only that, the function stretched as much as 6 lines, but in addition it is also not effective.
Let's try to improve it in both senses: speed and beauty (“pythonic way”).
OPTION2
Here we recall that in Python there is a "+" operator for strings and we get:
This is the slowest implementation. The puncture is that in this form the “+” operator in each step creates a new list object, which is thrown out at the next step, etc.
OPTION3 It’s
easy to fix , you need to replace “+” with a form that does not create a new list, but adds to the old one. This is the "+ =" operator, but I prefer to write the "extend" method explicitly.
So we got the fastest option, but not the shortest.
I will write all subsequent decisions through lambda expressions, since they consist of one expression. The name of the argument is reduced to ll, since in the code in one line this does not reduce readability.
OPTION4
Using the built-in functions for working with lists, you can rewrite option2 in the style of functional programming:
He is a little bit faster, but still braking for the same reason as his iterative relative. Here “lambda a, b: a + b” is an anonymous function of two arguments that simply returns their sum. Option B is just a shortcut built into Python for the convenience of calculating the sum of elements. This option is the shortest.
Personally, neither the shortest (speed) nor the fastest (beauty) suits me. Let's try to find a compromise.
OPTION5
Using list expressions:
Not much longer than the previous one, but radically faster. The option is undoubtedly beautiful, although nested list expressions are not always clear at first glance.
OPTION6
And what if you try to rewrite the fastest option in a functional style? Easy:
Notice " d.extend (el) or d " we had to add the " or " operator since the extend method returns None . In terms of speed, it is practically not inferior to the fastest method No. 3 (the difference in speed is literally a few percent and in my opinion is not significant).
In my opinion, the " editorial choice " should be awarded to option No. 6 )
For measuring the speed of small pieces of code in Python, there is a timeit library . Here is an example of code testing options 3, 5 and 6 (the fastest and most beautiful).
An example of starting a time test. It can be seen that the difference in speed between the iterative and functional variants is vanishingly small. The option on list expressions is noticeably slower (you can’t write off errors here), but the size of our lists is huge, for applications that are not critical to speed, it also has the right to life. HOME PROPOSAL I propose to solve / discuss a more complex task of deploying nested lists in a linear one. Example:
UPD: Reply habrahabr.ru/blogs/python/63539/#comment_1764419
UPD2:
OPTION 6B (from an anonymous commentator in LJ)
Even such a seemingly trivial problem can be solved in several ways, significantly differing in speed and expressiveness.
OPTION1
Everyone knows that list items can be looped through and that it is possible to add items to the end. This leads us to the first solution:
def listmerge1 (lstlst):
all = []
for lst in lstlst:
for el in lst:
all.append (el)
return all
Not only that, the function stretched as much as 6 lines, but in addition it is also not effective.
Let's try to improve it in both senses: speed and beauty (“pythonic way”).
OPTION2
Here we recall that in Python there is a "+" operator for strings and we get:
def listmerge2 (lstlst):
all = []
for lst in lstlst:
all = all + lst
return all
This is the slowest implementation. The puncture is that in this form the “+” operator in each step creates a new list object, which is thrown out at the next step, etc.
OPTION3 It’s
easy to fix , you need to replace “+” with a form that does not create a new list, but adds to the old one. This is the "+ =" operator, but I prefer to write the "extend" method explicitly.
So we got the fastest option, but not the shortest.
def listmerge3 (lstlst):
all = []
for lst in lstlst:
all.extend (lst)
return all
I will write all subsequent decisions through lambda expressions, since they consist of one expression. The name of the argument is reduced to ll, since in the code in one line this does not reduce readability.
# via the anonymous function
listmerge = lambda ll: simple-statement
# is equivalent to
def listmerge (ll):
return simple-statement
OPTION4
Using the built-in functions for working with lists, you can rewrite option2 in the style of functional programming:
listmerge4a = lambda ll: reduce (lambda a, b: a + b, ll, [])
listmerge4b = lambda ll: sum (ll, [])
He is a little bit faster, but still braking for the same reason as his iterative relative. Here “lambda a, b: a + b” is an anonymous function of two arguments that simply returns their sum. Option B is just a shortcut built into Python for the convenience of calculating the sum of elements. This option is the shortest.
Personally, neither the shortest (speed) nor the fastest (beauty) suits me. Let's try to find a compromise.
OPTION5
Using list expressions:
listmerge5 = lambda ll: [el for lst in ll for el in lst]
Not much longer than the previous one, but radically faster. The option is undoubtedly beautiful, although nested list expressions are not always clear at first glance.
OPTION6
And what if you try to rewrite the fastest option in a functional style? Easy:
listmerge6 = lambda s: reduce (lambda d, el: d.extend (el) or d, s, [])
Notice " d.extend (el) or d " we had to add the " or " operator since the extend method returns None . In terms of speed, it is practically not inferior to the fastest method No. 3 (the difference in speed is literally a few percent and in my opinion is not significant).
In my opinion, the " editorial choice " should be awarded to option No. 6 )
For measuring the speed of small pieces of code in Python, there is a timeit library . Here is an example of code testing options 3, 5 and 6 (the fastest and most beautiful).
import timeit options
= {
'Reduce':
'listmerge = lambda s: reduce (lambda d, el: d.extend (el) or d, s, [])',
'Iterate':
"" "
def listmerge (lstlst) :
all = []
for lst in lstlst:
all.extend (lst)
return all
"" ",
'Comprehension':
'listmerge = lambda ll: [x for lst in ll for x in lst]',
}
initstr = 'lstlst = [range (i) for i in range (1000)] \ ngc.enable () '
def test (variants, initstr, n = 100):
print "Test repeats n =", n, "times \ nINITSTR:", initstr, "\ n \ n"
for k, v in variants.iteritems ():
print k, "-",timeit.Timer ("listmerge (lstlst)", initstr + "\ n" + v) .timeit (n)
test (options, initstr, 100)
An example of starting a time test. It can be seen that the difference in speed between the iterative and functional variants is vanishingly small. The option on list expressions is noticeably slower (you can’t write off errors here), but the size of our lists is huge, for applications that are not critical to speed, it also has the right to life. HOME PROPOSAL I propose to solve / discuss a more complex task of deploying nested lists in a linear one. Example:
Test repeats n = 100 times
INITSTR: lstlst=[range(i) for i in range(1000)]
gc.enable()
Iterate - 1.56133103371
Reduce - 1.57647109032
Comprehension - 7.5749669075
# Original list:
[7, [[[[2]]], [[[]], [4]], [4,5, [6,7]]], 8]
# Result:
[7, 2, 4, 4, 5, 6, 7, 8]
UPD: Reply habrahabr.ru/blogs/python/63539/#comment_1764419
UPD2:
OPTION 6B (from an anonymous commentator in LJ)
import operator
listmerge6b = lambda s: reduce (operator.iadd, s, [])