How can I split a text into sentences using the Stanford parser?

后端 未结 12 1916
终归单人心
终归单人心 2020-11-27 14:52

How can I split a text or paragraph into sentences using Stanford parser?

Is there any method that can extract sentences, such as getSentencesFromString()

12条回答
  •  暖寄归人
    2020-11-27 15:44

    Another element, not addressed except in a few downvoted answers, is how to set the sentence delimiters? The most common way, the default, is to depend up the common punctuation marks which state the end of a sentence. There are other document formats that one might face from drawing upon gathered corpora, one of which being each line is it's own sentence.

    To set your delimiters for the DocumentPreprocessor as in the accepted answers, you would use setSentenceDelimiter(String). To use the pipeline approach suggested as in the answer by @Kevin, one would work with the ssplit properties. For example, to use the end of line scheme proposed in the previous paragraph, one would set the property ssplit.eolonly to true

提交回复
热议问题