How can I split a text into sentences?

前端 未结 13 1187
傲寒
傲寒 2020-11-22 06:33

I have a text file. I need to get a list of sentences.

How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.

13条回答
  •  梦如初夏
    2020-11-22 06:39

    The Natural Language Toolkit (nltk.org) has what you need. This group posting indicates this does it:

    import nltk.data
    
    tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
    fp = open("test.txt")
    data = fp.read()
    print '\n-----\n'.join(tokenizer.tokenize(data))
    

    (I haven't tried it!)

提交回复
热议问题