How can I split a text into sentences?

前端 未结 13 1186
傲寒
傲寒 2020-11-22 06:33

I have a text file. I need to get a list of sentences.

How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.

13条回答
  •  梦如初夏
    2020-11-22 07:00

    You can also use sentence tokenization function in NLTK:

    from nltk.tokenize import sent_tokenize
    sentence = "As the most quoted English writer Shakespeare has more than his share of famous quotes.  Some Shakespare famous quotes are known for their beauty, some for their everyday truths and some for their wisdom. We often talk about Shakespeare’s quotes as things the wise Bard is saying to us but, we should remember that some of his wisest words are spoken by his biggest fools. For example, both ‘neither a borrower nor a lender be,’ and ‘to thine own self be true’ are from the foolish, garrulous and quite disreputable Polonius in Hamlet."
    
    sent_tokenize(sentence)
    

提交回复
热议问题