Are you sure that using word_tokenize is the right approach?
This Google groups page has the example:
>>> import nltk
>>> text = nltk.Text(nltk.corpus.brown.words()) # Get text from brown
>>> text.generate()
But I've never used nltk, so I can't say whether that works the way you want.