I have a simple task: To count how many times every letter occurs in a string. I\'ve used a Counter() for it, but on one forum I saw information that using
Counter() allows you to count any hashable objects, not just substrings. Both solutions are O(n)-time. Your measurements show that the overhead of iterating and hashing individual characters by Counter() is greater than running s.count() 4 times.
Counter() can use C helper to count elements but it seems it doesn't special case strings and uses general algorithm applicable for any other iterable i.e., processing a single character involves multiple Python C API calls to advance the iterator, get previous value (a lookup in the hash table), increment counter, set new value (a lookup in the hash table):
while (1) {
key = PyIter_Next(it);
if (key == NULL)
break;
oldval = PyObject_GetItem(mapping, key);
if (oldval == NULL) {
if (!PyErr_Occurred() || !PyErr_ExceptionMatches(PyExc_KeyError))
break;
PyErr_Clear();
Py_INCREF(one);
newval = one;
} else {
newval = PyNumber_Add(oldval, one);
Py_DECREF(oldval);
if (newval == NULL)
break;
}
if (PyObject_SetItem(mapping, key, newval) == -1)
break;
Py_CLEAR(newval);
Py_DECREF(key);
}
Compare it to FASTSEARCH() overhead for bytestrings:
for (i = 0; i < n; i++)
if (s[i] == p[0]) {
count++;
if (count == maxcount)
return maxcount;
}
return count;