问题
I would like to patch some text data extracted from web pages. sample:
t="First sentence. Second sentence.Third sentence."
There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document.
I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex:
t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)
But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.
回答1:
You can backreference in the substitution string with \1 (to match capture group 1).
t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2") # => "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."
回答2:
- If you are using
gsub(regex, replacement), then use'\1','\2', ... to refer to the match. Make sure not to put double quotes around thereplacement, or else escape the backslash as in Joshua's answer. The conversion from'\1'to the match will be done withingsub, not by literal interpretation. - If you are using
gsub(regex){replacement}, then use$1,$1, ...
But for your case, it is easier not to use matches:
t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n")
回答3:
If you got here because of Rubocop complaining "Avoid the use of Perl-style backrefs." about $1, $2, etc... you can can do this instead:
some_id = $1
# or
some_id = Regexp.last_match[1] if Regexp.last_match
some_id = $5
# or
some_id = Regexp.last_match[5] if Regexp.last_match
It'll also want you to do
%r{//}.match(some_string)
instead of
some_string[//]
Lame (Rubocop)
来源:https://stackoverflow.com/questions/12065707/how-to-backreference-in-ruby-regular-expression-regex-with-gsub-when-i-use-gro