How to match something with regex that is not between two special characters?

后端 未结 3 1267
梦谈多话
梦谈多话 2020-12-03 11:14

I have a string like this:

a b c a b \" a b \" b a \" a \"

How do I match every a that is not part of a string deli

相关标签:
3条回答
  • 2020-12-03 12:04

    Full-blown regex solution for regex lover, without caring about performance or code-readability.

    This solution assumes that there is no escaping syntax (with escaping syntax, the a in "sbd\"a" is counted as inside the string).

    Pseudocode:

    processedString = 
        inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
                   .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote
    

    Then you can match the text you want in the processedString. You can remove the 2nd replace if you consider text after the lone quote as outside quote.

    EDIT

    In Ruby, the regex in the code above would be

    /\".*?\"/
    

    used with gsub

    and

    /\".*/
    

    used with sub


    To address the replacement problem, I'm not sure whether this is possible, but it worths trying:

    • Declare a counter
    • Use the regex /(\"|a)/ with gsub, and supply function.
    • In the function, if match is ", then increment counter, and return " as replacement (basically, no change). If match is a check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.
    0 讨论(0)
  • 2020-12-03 12:08

    js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

    As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a

    subject = 'a b c a b " a b " b a " a "'
    regex = /("[^"]*")|a/
    replaced = subject.gsub(regex) {|m|$1}
    puts replaced
    

    See this live demo

    Reference

    How to match pattern except in situations s1, s2, s3

    How to match a pattern unless...

    0 讨论(0)
  • 2020-12-03 12:14

    Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:

    result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')
    

    This replaces all the as with the empty string if and only if there is an even number of quotes ahead of the matched a.

    Explanation:

    a        # Match a
    (?=      # only if it's followed by...
     (?:     # ...the following:
      [^"]*" #  any number of non-quotes, followed by one quote
      [^"]*" #  the same again, ensuring an even number
     )*      # any number of times (0, 2, 4 etc. quotes)
     [^"]*   # followed by only non-quotes until
     \Z      # the end of the string.
    )        # End of lookahead assertion
    

    If you can have escaped quotes within quotes (a "length: 2\""), it's still possible but will be more complicated:

    result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')
    

    This is in essence the same regex as above, only substituting (?:\\.|[^"\\]) for [^"]:

    (?:     # Match either...
     \\.    # an escaped character
    |       # or
     [^"\\] # any character except backslash or quote
    )       # End of alternation
    
    0 讨论(0)
提交回复
热议问题