The following two expressions seem equivalent to me. Which one is preferable?
data = [(\'a\', 1), (\'b\', 1), (\'b\', 2)]
d1 = {}
d2 = {}
for key, val in d
The accepted answer from agf isn't comparing like with like. After:
print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] contains a list with 10,000 items whereas after:
print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)
d[0] is simply []. i.e. the d.setdefault version never modifies the list stored in d. The code should actually be:
print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)
and in fact is faster than the faulty setdefault example.
The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), i.e. effectively constant time.
Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.
So, assuming d3, d4 = defaultdict(list), {}
# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
d4[key].append(val)
else:
d4[key] = [val]
variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.
I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit. Avoid both of your original variants.