gsub | 易学教程

How do I limit the number of replacements when using gsub?

阅读更多关于 How do I limit the number of replacements when using gsub?

How do you limit the number of replacements made by String#gsub in Ruby? In PHP this can be easy done with preg_replace which takes a parameter for limiting replacements, but I can't figure out how to do this in Ruby. gsub replaces all occurences. You can try String#sub http://ruby-doc.org/core/classes/String.html#M001185 You can create a counter and decrement that within a gsub loop. str = 'aaaaaaaaaa' count = 5 p str.gsub(/a/){if count.zero? then $& else count -= 1; 'x' end} # => "xxxxxaaaaa" str = 'aaaaaaaaaa' # The following is so that the variable new_string exists in this scope, # not

Escaping strings for gsub

阅读更多关于 Escaping strings for gsub

I read a file: local logfile = io.open("log.txt", "r") data = logfile:read("*a") print(data) output: ... "(\.)\n(\w)", r"\1 \2" "\n[^\t]", "", x, re.S ... Yes, logfile looks awful as it's full of various commands How can I call gsub and remove i.e. "(\.)\n(\w)", r"\1 \2" line from data variable? Below snippet, does not work: s='"(\.)\n(\w)", r"\1 \2"' data=data:gsub(s, '') I guess some escaping needs to be done. Any easy solution? Update : local data = [["(\.)\n(\w)", r"\1 \2" "\n[^\t]", "", x, re.S]] local s = [["(\.)\n(\w)", r"\1 \2"]] local function esc(x) return (x:gsub('%%', '%%%%') :gsub

Regular expression in R to remove the part of a string after the last space

阅读更多关于 Regular expression in R to remove the part of a string after the last space

问题 I would like to have a gsub expression in R to remove everything in a string that occurs after the last space. E.g. string="Da Silva UF" should return me "Da Silva" . Any thoughts? 回答1: You can use the following. string <- 'Da Silva UF' gsub(' \\S*$', '', string) [1] "Da Silva" Explanation: ' ' \S* non-whitespace (all but \n, \r, \t, \f, and " ") (0 or more times) $ before an optional \n, and the end of the string 回答2: Using $ anchor: > string = "Da Silva UF" > gsub(" [^ ]*$", "", string) [1]

Removing parenthesis in R

阅读更多关于 Removing parenthesis in R

问题 I am trying to remove parentheses from a string value in this case this one: (40.703707008, -73.943257966) I can't seem to find a post with code that works; I know that this is a very simple task, but I've seen the following links but they either kill all my punctuation or don't seem to work. Below is the codes I've tried. Appreciate the help: remove parenthesis from string Remove parentheses and text within from strings in R x = ("(40.703707008, -73.943257966)") gsub("\\s*\$[^\$]+\\)","",x

get filename from url path in R

阅读更多关于 get filename from url path in R

问题 I would like to extract filename from url in R. For now I do it as follows, but maybe it can be done shorter like in python. assuming path is just string. path="http://www.exanple.com/foo/bar/fooXbar.xls" in R: tail(strsplit(path,"[/]")[[1]],1) in Python: path.split("/")[-1:] Maybe some sub, gsub solution? 回答1: There's a function for that... basename(path) [1] "fooXbar.xls" 回答2: @SimonO101 has the most robust answer IMO, but some other options: Since regular expressions are greedy, you can

Duplicating observations of a dataframe, but also replacing specific variable values in R

阅读更多关于 Duplicating observations of a dataframe, but also replacing specific variable values in R

问题 I am looking for some advice on some data restructuring. I am collecting some data using Google Forms which I download as a csv file and looks something like the following: # alpha beta option # 6 8, 9, 10, 11 apple # 9 6 pear # 1 6 apple # 3 8, 9 pear # 3 6, 8 lime # 3 1 apple # 2, 4, 7, 11 9 lime The data has two variables (alpha and beta) that each list numbers. For the majority of my data there is only one number in each variable. However, for some observations there can be two, three or

More than 9 backreferences in gsub()

阅读更多关于 More than 9 backreferences in gsub()

How to use gsub with more than 9 backreferences? I would expect the output in the example below to be "e, g, i, j, o". > test <- "abcdefghijklmnop" > gsub("(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)", "\\5, \\7, \\9, \\10, \\15", test, perl = TRUE) [1] "e, g, i, a0, a5" See Regular Expressions with The R Language : You can use the backreferences \1 through \9 in the replacement text to reinsert text matched by a capturing group . There is no replacement text token for the overall match. Place the entire regex in a capturing group and then use \1 . But with

How to use ruby gsub Regexp with many matches?

阅读更多关于 How to use ruby gsub Regexp with many matches?

I have csv file contents having double quotes inside quoted text test,first,line,"you are a "kind" man",thanks again,second,li,"my "boss" is you",good I need to replace every double quote not preceded or succeeded by a comma by "" test,first,line,"you are a ""kind"" man",thanks again,second,li,"my ""boss"" is you",good so " is replaced by "" I tried x.gsub(/([^,])"([^,])/, "#{$1}\"\"#{$2}") but didn't work Phrogz Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value: csv = <<ENDCSV test,first,line,"you are a "kind"

replace words in R data.frames (Text Mining)

阅读更多关于 replace words in R data.frames (Text Mining)

I'm working on a Text Mining Solution with SQL and R. First I Import Data into R from my SQL selection and than I do data mining stuff with it. Here is what I got: rawData = sqlQuery(dwhConnect,sqlString) a = data.frame(rawData$ENNOTE_NEU) If I do a a[[1]][1:3] you see the structure: [1] lorem ipsum li ld ee wö wo di dd [2] la kdin di da dogs chicken [3] kd good i need some help Now I want to do some data cleaning with my own dictionary. An Example would be to replace li with lorem ipsum and kd as well as kdin with kunde My Problem is how to do that for the whole Data Frame. for(i in 1:(nrow(a

Regex issue in gsub

阅读更多关于 Regex issue in gsub

问题 I have defined vec <- "5f 110y, Fast" and gsub("[\\s0-9a-z]+,", "", vec) gives " 5f Fast " I would have expected it to give " Fast " since everything before the comma should get matched by the regex. Can anyone explain to me why this is not the case? 回答1: You should keep in mind that, in TRE regex patterns, you cannot use regex escapes like \s , \d , \w . So, the regex in your case, "[\\s0-9a-z]+," , matches 1 or more \ , s , digits and lowercase ASCII letters, and then a single , . You may