How do I split a string by commas except inside parenthesis, using a regular expression?

后端 未结 2 417
天命终不由人
天命终不由人 2020-12-17 02:29

I want to split a string by comma:

\"a,s\".split \',\'  # => [\'a\', \'s\']

I don\'t want to split a sub-string if it is wrapped by par

相关标签:
2条回答
  • 2020-12-17 02:41

    Assuming that parentheses are not nested:

    "a,s(d,f),g,h"
    .scan(/(?:\([^()]*\)|[^,])+/)
    # => ["a", "s(d,f)", "g", "h"]
    
    0 讨论(0)
  • 2020-12-17 03:04

    To deal with nested parenthesis, you can use:

    txt = "a,s(d,f(4,5)),g,h"
    pattern = Regexp.new('((?:[^,(]+|(\((?>[^()]+|\g<-1>)*\)))+)')
    puts txt.scan(pattern).map &:first
    

    pattern details:

    (                        # first capturing group
        (?:                  # open a non capturing group
            [^,(]+           # all characters except , and (
          |                  # or
            (                # open the second capturing group
               \(            # (
                (?>          # open an atomic group
                    [^()]+   # all characters except parenthesis
                  |          # OR
                    \g<-1>   # the last capturing group (you can also write \g<2>)
                )*           # close the atomic group
                \)           # )
            )                # close the second capturing group
        )+                   # close the non-capturing group and repeat it
    )                        # close the first capturing group
    

    The second capturing group describe the nested parenthesis that can contain characters that are not parenthesis or the capturing group itself. It's a recursive pattern.

    Inside the pattern, you can refer to a capture group with his number (\g<2> for the second capturing group) or with his relative position (\g<-1> the first on the left from the current position in the pattern) (or with his name if you use named capturing groups)

    Notice: You can allow single parenthesis if you add |[()] before the end of the non-capturing group. Then a,b(,c will give you ['a', 'b(', 'c']

    0 讨论(0)
提交回复
热议问题