Split a string by whitespace, keeping quoted segments, allowing escaped quotes

前端 未结 4 1927
南方客
南方客 2020-11-28 09:27

I currently have this regular expression to split strings by all whitespace, unless it\'s in a quoted segment:

keywords = \'pop rock \"hard rock\"\';
keyword         


        
4条回答
  •  爱一瞬间的悲伤
    2020-11-28 09:50

    If Kobi's answer works well for the example string, it doesn't when there are more than one successive escape characters (backslashes) between quotes as Tim Pietzcker noticed it in comments. To handle these cases, the pattern can be written like this (for the match method):

    (?=\S)[^"\s]*(?:"[^\\"]*(?:\\[\s\S][^\\"]*)*"[^"\s]*)*
    

    demo

    Where (?=\S) ensures there's at least one non-white-space character at the current position since the following, that describes all allowed sub-strings (including whitespaces between quotes) is totally optional.

    Details:

    (?=\S)   # followed by a non-whitespace
    [^"\s]*  #"# zero or more characters that aren't a quote or a whitespace
    (?: # when a quoted substring occurs:
        "       #"# opening quote
        [^\\"]* #"# zero or more characters that aren't a quote or a backslash
        (?: # when a backslash is encountered:
            \\ [\s\S] # an escaped character (including a quote or a backslash)
            [^\\"]* #"#
        )*
        "         #"# closing quote
        [^"\s]*   #"#
    )*
    

提交回复
热议问题