I have written the following code to tokenize the input paragraph that comes from the file samp.txt. Can anybody help me out to find and print the number of sentences, words
With nltk, you can also use FreqDist (see O'Reillys Book Ch3.1)
And in your case:
import nltk raw = open('samp.txt').read() raw = nltk.Text(nltk.word_tokenize(raw.decode('utf-8'))) fdist = nltk.FreqDist(raw) print fdist.N()