发表新帖

发表新帖

Performance comparison: insert vs build Python set operations

前端未结

关注

 4  520

时光说笑 2021-02-02 17:14

In python, is it faster to a) Build a set from a list of n items b) Insert n items into a set?

I found this page (http://wiki.python.org/moin/TimeComplexity) but it did

4条回答

忘掉有多难 (楼主)

2021-02-02 17:50
In terms of O() complexity - it's definitely the same, because both approaches do exactly the same - insert n items into a set.

The difference comes from implementation: One clear advantage of initialization from an iterable is that you save a lot of Python-level function calls - the initialization from a iterable is done wholly on the C level (**).

Indeed, some tests on a list of 5,000,000 random integers shows that adding one by one is slower:
```
lst = [random.random() for i in xrange(5000000)]
set1 = set(lst)    # takes 2.4 seconds

set2 = set()       # takes 3.37 seconds
for item in lst:
    set2.add(item)
```
(**) Looking inside the code of sets (Objects/setobject.c), eventually item insertion boils down to a call to set_add_key. When initializing from an iterable, this function is called in a tight C loop:
```
while ((key = PyIter_Next(it)) != NULL) {
  if (set_add_key(so, key) == -1) {
    Py_DECREF(it);
    Py_DECREF(key);
    return -1;
  } 
  Py_DECREF(key);
}
```
On the other hand, each call to set.add invokes attribute lookup, which resolves to the C set_add function, which in turn calls set_add_key. Since the item addition itself is relatively quick (the hash table implementation of Python is very efficient), these extra calls all build up.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题