gsub | 易学教程

r - remove last but one character from every field

阅读更多关于 r - remove last but one character from every field

问题 the substr() function in R can isolate any character by position e.g. substr(df$10,2,3) or by using nchar() it is possible to work backwards from the end of the field to isolate a character in a position such as last but one using: substr(df$10,nchar(df$10)-2,nchar(df$10)-1) however I would like to know how to simply remove the last but one character of every field for a column in a dataframe. I am having difficulty doing this. any help would be great! 回答1: You can use a regular expression

r - remove last but one character from every field

阅读更多关于 r - remove last but one character from every field

Regex gsub R differentiate between ellipsis and periods

阅读更多关于 Regex gsub R differentiate between ellipsis and periods

问题 text="stack overflow... is a popular website." I want to separate punctuation marks from words. The output should be: "stack overflow ... is a popular website . " Of course, the command gsub("\\.", " \\. ", text, fixed = FALSE) returns: "stack overflow . . . is a popular website . " because it does not differentiate between periods and ellipsis (suspension points). In short, when three periods are found together in the text, R should consider them as a single punctuation mark. 回答1: I think a

Ruby regex for stripping BBCode

阅读更多关于 Ruby regex for stripping BBCode

问题 I'm trying to remove BBCode from a given string (just using gsub with some regex). Here's an example string: The [b]quick[/b] brown [url=http://example.com]fox[/url] jumps over the lazy dog [img=http://example.com/lazy_dog.png] And what I need that to output is: The quick brown fox jumps over the lazy dog So what's a way to do that? I've found various examples of doing this, but none have worked for my use case. One that I've tried: /\[(\w+)[^w]*?](.*?)\[\/\1]/ But that wouldn't catch the

R: gsub/replace only those occurrences following a keyword occurrence

阅读更多关于 R: gsub/replace only those occurrences following a keyword occurrence

问题 I only want to replace string occurrences that follow a particular keyword/pattern and not before. in other words, do nothing until the first occurrence of the keyword-pattern, and then start to gsub to the right of that keyword-pattern. See below: gsub("\\[|\\]", "", "ab[ cd] ef keyword [ gh ]keyword ij ") Actual results: "ab cd ef keyword gh keyword ij " Desired results: "ab[ cd] [][asfg] ]] ef keyword gh keyword ij " [Edited to fix the results. I don't want to remove 'keyword'] [Edited to

Changing row names in a data_frame from letters to numbers in R

阅读更多关于 Changing row names in a data_frame from letters to numbers in R

问题 I have a group of datasets, from a survey applied to many different countries, which I want to combine to create a single merged data.frame. Unfortunately, for one of them , the variable names is different from the others, but it follows a pattern: as in the others the names of the variables are like "VAR1", "VAR2", etc., in this one their names are "VAR_a", "VAR_b", etc. The code I've used so far to solve this problem is something like: names (df) <- gsub("_a", "01", names(df)) names (df) <-

R: gsub of exact full string with fixed = T

阅读更多关于 R: gsub of exact full string with fixed = T

问题 I am trying to gsub exact FULL string - I know I need to use ^ and $ . The problem is that I have special characters in strings (could be [ , or . ) so I need to use fixed=T . This overrides the ^ and $ . Any solution is appreciated. Need to replace 1st, 2nd element in exact_orig with 1st, 2nd element from exact_change but only if full string is matched from beginning to end. exact_orig = c("oz","32 oz") exact_change = c("20 oz","32 ct") gsub_FixedTrue <- function(i) { for(k in seq_along

can't remove blank lines in txt file with R

阅读更多关于 can't remove blank lines in txt file with R

问题 I am doing a text analysis with R and needed to convert the first letters of the sentences into lowercase while keeping the other capitalized words the way they are. So I used the command x <- gsub("(\\..*?[A-Z])", '\\L\\1', x, perl=TRUE) which worked, but partially. The problem is that for the text analysis I had to convert the pdf files into txt format and now the txt files contain a lot of empty lines (page breaks, returns possibly), and therefore the command I used does not convert the

Removing a pattern With gsub in r

阅读更多关于 Removing a pattern With gsub in r

问题 I have a string Project Change Request (PCR) - HONDA DIGITAL PLATEFORM saved in supp_matches , and supp_matches1 contains the string Project Change Request (PCR) - . supp_matches2 <- gsub("^.*[supp_matches1]","",supp_matches) supp_matches2 # [1] " (PCR) - HONDA DIGITAL PLATEFORM" Which is actually not correct but it should come like supp_matches2 # [1] "HONDA DIGITAL PLATEFORM" Why is it not coming the way it should be? 回答1: As I say in my comment, in your expression gsub("^.*[supp_matches1]"

Regular expression for the “opposite” result

阅读更多关于 Regular expression for the “opposite” result

问题 Take the following character vector x x <- c("1 Date in the form", "2 Number of game", "3 Day of week", "4-5 Visiting team and league") My desired result is the following vector, with the first capitalized word from each string and, if the string contains a - , also the last word. [1] "Date" "Number" "Day" "Visiting" "league" So instead of doing unlist(sapply(strsplit(x, "[[:blank:]]+|, "), function(y){ if(grepl("[-]", y[1])) c(y[2], tail(y,1)) else y[2] })) to get the result, I figured I