I have random text stored in $sentences
. Using regex, I want to split the text into sentences, see:
function splitSentences($text) {
$re = \
If spaces are unreliable, than you could use match on a .
followed by any number of spaces, followed by a capital letter.
You can match any capital UTF-8 letter using the Unicode character property \p{Lu}
.
You only need to exclude abbreviations which tend to follow own names (person names, company names, etc), since they start with a capital letter.
function splitSentences($text) {
$re = '/ # Split sentences ending with a dot
.+? # Match everything before, until we find
(
$ | # the end of the string, or
\. # a dot
(?
Note: This answer might not be accurate enough for your situation. I'm unable to judge that. It does address the problem as described above and is easily understandable.