How do the count the number of sentences, words and characters in a file?

前端未结

关注

 7  1395

I have written the following code to tokenize the input paragraph that comes from the file samp.txt. Can anybody help me out to find and print the number of sentences, words

相关标签:

7条回答

抹茶落季

2020-12-10 06:54
With nltk, you can also use FreqDist (see O'Reillys Book Ch3.1)

And in your case:
```
import nltk
raw = open('samp.txt').read()
raw = nltk.Text(nltk.word_tokenize(raw.decode('utf-8')))
fdist = nltk.FreqDist(raw)
print fdist.N()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2