Split string into sentences using regex

前端 未结 6 1942
挽巷
挽巷 2020-11-28 10:54

I have random text stored in $sentences. Using regex, I want to split the text into sentences, see:

function splitSentences($text) {
    $re = \         


        
6条回答
  •  离开以前
    2020-11-28 11:37

    If spaces are unreliable, than you could use match on a . followed by any number of spaces, followed by a capital letter.

    You can match any capital UTF-8 letter using the Unicode character property \p{Lu}.

    You only need to exclude abbreviations which tend to follow own names (person names, company names, etc), since they start with a capital letter.

    function splitSentences($text) {
        $re = '/                # Split sentences ending with a dot
            .+?                 # Match everything before, until we find
            (
              $ |               # the end of the string, or
              \.                # a dot
              (?

    Note: This answer might not be accurate enough for your situation. I'm unable to judge that. It does address the problem as described above and is easily understandable.

提交回复
热议问题