How to Split a Paragraph into Sentences

独自空忆成欢 提交于 2019-11-27 08:16:20

问题


I've been trying to use:

$string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!";
preg_match_all('~.*?[?.!]~s',$string,$sentences);
print_r($sentences);

But it doesn't work on Dr., U.S.A., etc.

Does anyone have any better suggestions?


回答1:


there is not any simple solution for that. you need do some natural language processing(NLP) in your application and recognize each sentence. there is something call OpenNLP, it's a JAVA-based NLP parser tool. Or Stanford NLP parser in Ruby. you can find something like that for php.

here I found a set of classes for natural language processing in PHP.




回答2:


hmmm maybe try something like $sentences = preg_split('/.*?[?.!]+\s+/', $string);




回答3:


This is almost impossible since your example clearly indicates that punctuation characters that can be used in e.g. Dr., U.S.A etc, make it impossible to know where a sentence starts/ends.

You have to search the following characters to decide if a new sentence follows (starts after) the punctuation chars you are mentioning.



来源:https://stackoverflow.com/questions/2158296/how-to-split-a-paragraph-into-sentences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!