I have a string like this:
a b c a b \" a b \" b a \" a \"
How do I match every a that is not part of a string deli
Full-blown regex solution for regex lover, without caring about performance or code-readability.
This solution assumes that there is no escaping syntax (with escaping syntax, the a in "sbd\"a" is counted as inside the string).
Pseudocode:
processedString =
inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
.replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote
Then you can match the text you want in the processedString. You can remove the 2nd replace if you consider text after the lone quote as outside quote.
EDIT
In Ruby, the regex in the code above would be
/\".*?\"/
used with gsub
and
/\".*/
used with sub
To address the replacement problem, I'm not sure whether this is possible, but it worths trying:
/(\"|a)/ with gsub, and supply function.", then increment counter, and return " as replacement (basically, no change). If match is a check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a
subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced
See this live demo
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:
result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')
This replaces all the as with the empty string if and only if there is an even number of quotes ahead of the matched a.
Explanation:
a # Match a
(?= # only if it's followed by...
(?: # ...the following:
[^"]*" # any number of non-quotes, followed by one quote
[^"]*" # the same again, ensuring an even number
)* # any number of times (0, 2, 4 etc. quotes)
[^"]* # followed by only non-quotes until
\Z # the end of the string.
) # End of lookahead assertion
If you can have escaped quotes within quotes (a "length: 2\""), it's still possible but will be more complicated:
result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')
This is in essence the same regex as above, only substituting (?:\\.|[^"\\]) for [^"]:
(?: # Match either...
\\. # an escaped character
| # or
[^"\\] # any character except backslash or quote
) # End of alternation