I\'ve seen people say that set
objects in python have O(1) membership-checking. How are they implemented internally to allow this? What sort of data structure d
We all have easy access to the source, where the comment preceding set_lookkey()
says:
/* set object implementation
Written and maintained by Raymond D. Hettinger
Derived from Lib/sets.py and Objects/dictobject.c.
The basic lookup function used by all operations.
This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4.
The initial probe index is computed as hash mod the table size.
Subsequent probe indices are computed as explained in Objects/dictobject.c.
To improve cache locality, each probe inspects a series of consecutive
nearby entries before moving on to probes elsewhere in memory. This leaves
us with a hybrid of linear probing and open addressing. The linear probing
reduces the cost of hash collisions because consecutive memory accesses
tend to be much cheaper than scattered probes. After LINEAR_PROBES steps,
we then use open addressing with the upper bits from the hash value. This
helps break-up long chains of collisions.
All arithmetic on hash should ignore overflow.
Unlike the dictionary implementation, the lookkey function can return
NULL if the rich comparison returns an error.
*/
...
#ifndef LINEAR_PROBES
#define LINEAR_PROBES 9
#endif
/* This must be >= 1 */
#define PERTURB_SHIFT 5
static setentry *
set_lookkey(PySetObject *so, PyObject *key, Py_hash_t hash)
{
...