I am working on a search program over an inverted index. The index itself is a dictionary whose keys are terms and whose values are themselves dictionaries of short document
def two_keys(term_a, term_b, index):
doc_ids = set(index[term_a].keys()) & set(index[term_b].keys())
doc_store = index[term_a] # index[term_b] would work also
return {doc_id: doc_store[doc_id] for doc_id in doc_ids}
def n_keys(terms, index):
doc_ids = set.intersection(*[set(index[term].keys()) for term in terms])
doc_store = index[term[0]]
return {doc_id: doc_store[doc_id] for doc_id in doc_ids}
In [0]: index = {'a': {1: 'a b'},
'b': {1: 'a b'}}
In [1]: two_keys('a','b', index)
Out[1]: {1: 'a b'}
In [2]: n_keys(['a','b'], index)
Out[2]: {1: 'a b'}
I would recommend changing your index from
index = {term: {doc_id: doc}}
to two indexes one for the terms and then a separate index to hold the values
term_index = {term: set([doc_id])}
doc_store = {doc_id: doc}
that way you don't store multiple copies of the same data