Why is collections.Counter much slower than ''.count?

前端 未结 2 1117
你的背包
你的背包 2020-12-20 18:44

I have a simple task: To count how many times every letter occurs in a string. I\'ve used a Counter() for it, but on one forum I saw information that using

2条回答
  •  眼角桃花
    2020-12-20 19:28

    Counter() allows you to count any hashable objects, not just substrings. Both solutions are O(n)-time. Your measurements show that the overhead of iterating and hashing individual characters by Counter() is greater than running s.count() 4 times.

    Counter() can use C helper to count elements but it seems it doesn't special case strings and uses general algorithm applicable for any other iterable i.e., processing a single character involves multiple Python C API calls to advance the iterator, get previous value (a lookup in the hash table), increment counter, set new value (a lookup in the hash table):

        while (1) {
            key = PyIter_Next(it);
            if (key == NULL)
                break;
            oldval = PyObject_GetItem(mapping, key);
            if (oldval == NULL) {
                if (!PyErr_Occurred() || !PyErr_ExceptionMatches(PyExc_KeyError))
                    break;
                PyErr_Clear();
                Py_INCREF(one);
                newval = one;
            } else {
                newval = PyNumber_Add(oldval, one);
                Py_DECREF(oldval);
                if (newval == NULL)
                    break;
            }
            if (PyObject_SetItem(mapping, key, newval) == -1)
                break;
            Py_CLEAR(newval);
            Py_DECREF(key);
        }
    

    Compare it to FASTSEARCH() overhead for bytestrings:

        for (i = 0; i < n; i++)
            if (s[i] == p[0]) {
               count++;
               if (count == maxcount)
                  return maxcount;
            }
        return count;
    

提交回复
热议问题