real word count in NLTK
问题 The NLTK book has a couple of examples of word counts, but in reality they are not word counts but token counts. For instance, Chapter 1, Counting Vocabulary says that the following gives a word count: text = nltk.Text(tokens) len(text) However, it doesn't - it gives a word and punctuation count. How can you get a real word count (ignoring punctuation)? Similarly, how can you get the average number of characters in a word? The obvious answer is: word_average_length =(len(string_of_text)/len