I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a \'tag\' list. The problem is that
I'd start with a wonderful chapter, by Peter Norvig, in the O'Reilly book Beautiful Data. He provides the ngram data you'll need, along with beautiful Python code (which may solve your problems as-is, or with some modification) on his personal web site.