I have a full inverted index in form of nested python dictionary. Its structure is :
{word : { doc_name : [location_list] } }
For example l
Dependend on how long is 'long' you have to think about the trade-offs you have to make: either have all data ready in memory after (long) startup, or load only partial data (then you need to split up the date in multiple files or use SQLite or something like this). I doubt that loading all data upfront from e.g. sqlite into a dictionary will bring any improvement.