How to use ruby gsub Regexp with many matches?

和自甴很熟 提交于 2019-12-05 09:53:36

问题


I have csv file contents having double quotes inside quoted text

test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good

I need to replace every double quote not preceded or succeeded by a comma by ""

test,first,line,"you are a ""kind"" man",thanks
again,second,li,"my ""boss"" is you",good

so " is replaced by ""

I tried

x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}")

but didn't work


回答1:


Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9.

  • (?<!^|,) — immediately preceding this spot there must not be either a start of line (^) or a comma
  • " — find a double quote
  • (?!,|$) — immediately following this spot there must not be either a comma or end of line ($)

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \1 correctly in your replacement string.

For more information, see the section "Anchors" in the official Ruby regex documentation.


However, for the case where you do need to replace matches in your output, you can use any of the following:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>"           #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" }  #=> "h<e>ll<o>"

You can't use String interpolation in the replacement string, as you did:

"hello".gsub /([aeiou])/, "<#{$1}>"
 #=> "h<previousmatch>ll<previousmatch>"

…because that string interpolation happens once, before the gsub has been run. Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use.


Edit: For Ruby 1.8 (why on earth are you using that?) you can use:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')



回答2:


Assuming s is a string, this will work:

puts s.gsub(/([^,])"([^,])/, "\\1\"\"\\2")


来源:https://stackoverflow.com/questions/9098759/how-to-use-ruby-gsub-regexp-with-many-matches

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!