strsplit | 易学教程

Create new column with dplyr mutate and substring of existing column

阅读更多关于 Create new column with dplyr mutate and substring of existing column

问题 I have a dataframe with a column of strings and want to extract substrings of those into a new column. Here is some sample code and data showing I want to take the string after the final underscore character in the id column in order to create a new_id column. The id column entry always has 2 underscore characters and it's always the final substring I would like. df = data.frame( id = I(c("abcd_123_ABC","abc_5234_NHYK")), x = c(1.0,2.0) ) require(dplyr) df = df %>% dplyr::mutate(new_id =

removing particular character in a column in r

阅读更多关于 removing particular character in a column in r

问题 I have a table called LOAN containing column named RATE in which the observations are given in percentage for example 14.49% how can i format the table so that all value in rate are edited and % is removed from the entries so that i can use plot function on it .I tried using strsplit. strsplit(LOAN$RATE,"%") but got error non character argument 回答1: Items that appear to be character when printed but for which R thinks otherwise are generally factor classes objects. I'm also guessing htat you

Manipulate char vectors inside a data.table object in R

阅读更多关于 Manipulate char vectors inside a data.table object in R

问题 I'm a bit new still to using data.table and understanding all its subtleties. I've looked in the doc and in other examples in SO but couldn't find what I want, so please help ! I have a data.table which is basically a char vector (each entry being a sentence) DT=c("I love you","she loves me") DT=as.data.table(DT) colnames(DT) <- "text" setkey(DT,text) # > DT # text # 1: I love you # 2: she loves me What I'd like to do, is to be able to perform some basic string operations inside the DT object

split string without loss of characters

阅读更多关于 split string without loss of characters

问题 I wish to split strings at a certain character while retaining that character in the second resulting string. I can achieve almost all of the desired operation, except that I lose the characters I specify in strsplit , which I guess is called the delimiter. Is there a way to request that strsplit retain the delimiter? Or must I use a regular expression of some kind? Thank you for any advice. This seems like a very basic question. Sorry if it is a duplicate. I prefer to use base R. Here is an

split string without loss of characters

阅读更多关于 split string without loss of characters

split string with regex

阅读更多关于 split string with regex

问题 I'm looking to split a string of a generic form, where the square brackets denote the "sections" of the string. Ex: x <- "[a] + [bc] + 1" And return a character vector that looks like: "[a]" " + " "[bc]" " + 1" EDIT: Ended up using this: x <- "[a] + [bc] + 1" x <- gsub("\\[",",[",x) x <- gsub("\\]","],",x) strsplit(x,",") 回答1: I've seen TylerRinker's code and suspect it may be more clear than this but this may serve as way to learn a different set of functions. (I liked his better before I

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

阅读更多关于 Why does strsplit use positive lookahead and lookbehind assertion matches differently?

问题 Common sense and a sanity-check using gregexpr() indicate that the look-behind and look-ahead assertions below should each match at exactly one location in testString : testString <- "text XX text" BB <- "(?<= XX )" FF <- "(?= XX )" as.vector(gregexpr(BB, testString, perl=TRUE)[[1]]) # [1] 9 as.vector(gregexpr(FF, testString, perl=TRUE)[[1]][1]) # [1] 5 strsplit() , however, uses those match locations differently, splitting testString at one location when using the lookbehind assertion, but

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

阅读更多关于 Why does strsplit use positive lookahead and lookbehind assertion matches differently?

Unlist multiple values in dataframe column but keep track of the row number

阅读更多关于 Unlist multiple values in dataframe column but keep track of the row number

问题 I have a data frame that contains a column with multiple values consisting of gene name synonyms separated by semicolons: score <- c("32.01","19.5","18.0") symbol <- c("30 kDa adipocyte complemen related protein","AAT1","Cachectin") synonym <- c("30 kDa adipocyte complemen related protein; 30 kDa adipocyte complement-related protein; ACDC; ACRP30; ADIPOQ; APM-1; APM1; Adipocyte C1Q and collagen domain containing","AAT1; AAT1; ALT-1; ALT1; Alanine aminotransferase; Alanine aminotransferase 1;

Do a string split for more than one row in MATLAB

阅读更多关于 Do a string split for more than one row in MATLAB

问题 I have written a for loop in which to split 5000 rows accordingly along each of the columns that they are in. Example of the cell array that contains those rows: From that picture, i would like to split each row accordingly along their respective columns of that row starting from the first column to the end. This is the code that i have written: for i = pdbindex(:,1) clean_pdb = regexprep(pdbindex, ':', ' '); % removes the colon (:) from the array and replaces it with a whitespace pdb2char =