How do the count the number of sentences, words and characters in a file?

前端 未结 7 1355
清歌不尽
清歌不尽 2020-12-10 06:26

I have written the following code to tokenize the input paragraph that comes from the file samp.txt. Can anybody help me out to find and print the number of sentences, words

相关标签:
7条回答
  • 2020-12-10 06:54

    With nltk, you can also use FreqDist (see O'Reillys Book Ch3.1)

    And in your case:

    import nltk
    raw = open('samp.txt').read()
    raw = nltk.Text(nltk.word_tokenize(raw.decode('utf-8')))
    fdist = nltk.FreqDist(raw)
    print fdist.N()
    
    0 讨论(0)
提交回复
热议问题