a friend of mine wrote this little progam.
the textFile
is 1.2GB in size (7 years worth of newspapers).
He successfully manages to create the dictionary but he
Do you really need the whole data in memory? You could split it in naive ways like one file for each year o each month if you want the dictionary/pickle approach.
Also, remember that the dictionaries are not sorted, you can have problems having to sort that ammount of data. In case you want to search or sort the data, of course...
Anyway, I think that the database approach commented before is the most flexible one, specially on the long run...