问题
I've been trying to use:
$string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!";
preg_match_all('~.*?[?.!]~s',$string,$sentences);
print_r($sentences);
But it doesn't work on Dr., U.S.A., etc.
Does anyone have any better suggestions?
回答1:
there is not any simple solution for that. you need do some natural language processing(NLP) in your application and recognize each sentence. there is something call OpenNLP, it's a JAVA-based NLP parser tool. Or Stanford NLP parser in Ruby. you can find something like that for php.
here I found a set of classes for natural language processing in PHP.
回答2:
hmmm maybe try something like $sentences = preg_split('/.*?[?.!]+\s+/', $string);
回答3:
This is almost impossible since your example clearly indicates that punctuation characters that can be used in e.g. Dr., U.S.A etc, make it impossible to know where a sentence starts/ends.
You have to search the following characters to decide if a new sentence follows (starts after) the punctuation chars you are mentioning.
来源:https://stackoverflow.com/questions/2158296/how-to-split-a-paragraph-into-sentences