How can I split a text into sentences?

前端 未结 13 1172
傲寒
傲寒 2020-11-22 06:33

I have a text file. I need to get a list of sentences.

How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.

13条回答
  •  野趣味
    野趣味 (楼主)
    2020-11-22 06:51

    No doubt that NLTK is the most suitable for the purpose. But getting started with NLTK is quite painful (But once you install it - you just reap the rewards)

    So here is simple re based code available at http://pythonicprose.blogspot.com/2009/09/python-split-paragraph-into-sentences.html

    # split up a paragraph into sentences
    # using regular expressions
    
    
    def splitParagraphIntoSentences(paragraph):
        ''' break a paragraph into sentences
            and return a list '''
        import re
        # to split by multile characters
    
        #   regular expressions are easiest (and fastest)
        sentenceEnders = re.compile('[.!?]')
        sentenceList = sentenceEnders.split(paragraph)
        return sentenceList
    
    
    if __name__ == '__main__':
        p = """This is a sentence.  This is an excited sentence! And do you think this is a question?"""
    
        sentences = splitParagraphIntoSentences(p)
        for s in sentences:
            print s.strip()
    
    #output:
    #   This is a sentence
    #   This is an excited sentence
    
    #   And do you think this is a question 
    

提交回复
热议问题