strsplit

Create new column with dplyr mutate and substring of existing column

旧街凉风 提交于 2020-01-12 06:45:10
问题 I have a dataframe with a column of strings and want to extract substrings of those into a new column. Here is some sample code and data showing I want to take the string after the final underscore character in the id column in order to create a new_id column. The id column entry always has 2 underscore characters and it's always the final substring I would like. df = data.frame( id = I(c("abcd_123_ABC","abc_5234_NHYK")), x = c(1.0,2.0) ) require(dplyr) df = df %>% dplyr::mutate(new_id =

removing particular character in a column in r

怎甘沉沦 提交于 2020-01-09 10:51:33
问题 I have a table called LOAN containing column named RATE in which the observations are given in percentage for example 14.49% how can i format the table so that all value in rate are edited and % is removed from the entries so that i can use plot function on it .I tried using strsplit. strsplit(LOAN$RATE,"%") but got error non character argument 回答1: Items that appear to be character when printed but for which R thinks otherwise are generally factor classes objects. I'm also guessing htat you

Manipulate char vectors inside a data.table object in R

空扰寡人 提交于 2020-01-06 19:34:11
问题 I'm a bit new still to using data.table and understanding all its subtleties. I've looked in the doc and in other examples in SO but couldn't find what I want, so please help ! I have a data.table which is basically a char vector (each entry being a sentence) DT=c("I love you","she loves me") DT=as.data.table(DT) colnames(DT) <- "text" setkey(DT,text) # > DT # text # 1: I love you # 2: she loves me What I'd like to do, is to be able to perform some basic string operations inside the DT object

split string without loss of characters

岁酱吖の 提交于 2020-01-03 17:49:14
问题 I wish to split strings at a certain character while retaining that character in the second resulting string. I can achieve almost all of the desired operation, except that I lose the characters I specify in strsplit , which I guess is called the delimiter. Is there a way to request that strsplit retain the delimiter? Or must I use a regular expression of some kind? Thank you for any advice. This seems like a very basic question. Sorry if it is a duplicate. I prefer to use base R. Here is an

split string without loss of characters

假装没事ソ 提交于 2020-01-03 17:46:50
问题 I wish to split strings at a certain character while retaining that character in the second resulting string. I can achieve almost all of the desired operation, except that I lose the characters I specify in strsplit , which I guess is called the delimiter. Is there a way to request that strsplit retain the delimiter? Or must I use a regular expression of some kind? Thank you for any advice. This seems like a very basic question. Sorry if it is a duplicate. I prefer to use base R. Here is an

split string with regex

谁都会走 提交于 2019-12-29 05:25:11
问题 I'm looking to split a string of a generic form, where the square brackets denote the "sections" of the string. Ex: x <- "[a] + [bc] + 1" And return a character vector that looks like: "[a]" " + " "[bc]" " + 1" EDIT: Ended up using this: x <- "[a] + [bc] + 1" x <- gsub("\\[",",[",x) x <- gsub("\\]","],",x) strsplit(x,",") 回答1: I've seen TylerRinker's code and suspect it may be more clear than this but this may serve as way to learn a different set of functions. (I liked his better before I

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

眉间皱痕 提交于 2019-12-27 11:07:10
问题 Common sense and a sanity-check using gregexpr() indicate that the look-behind and look-ahead assertions below should each match at exactly one location in testString : testString <- "text XX text" BB <- "(?<= XX )" FF <- "(?= XX )" as.vector(gregexpr(BB, testString, perl=TRUE)[[1]]) # [1] 9 as.vector(gregexpr(FF, testString, perl=TRUE)[[1]][1]) # [1] 5 strsplit() , however, uses those match locations differently, splitting testString at one location when using the lookbehind assertion, but

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

限于喜欢 提交于 2019-12-27 11:06:35
问题 Common sense and a sanity-check using gregexpr() indicate that the look-behind and look-ahead assertions below should each match at exactly one location in testString : testString <- "text XX text" BB <- "(?<= XX )" FF <- "(?= XX )" as.vector(gregexpr(BB, testString, perl=TRUE)[[1]]) # [1] 9 as.vector(gregexpr(FF, testString, perl=TRUE)[[1]][1]) # [1] 5 strsplit() , however, uses those match locations differently, splitting testString at one location when using the lookbehind assertion, but

Unlist multiple values in dataframe column but keep track of the row number

旧巷老猫 提交于 2019-12-25 07:46:58
问题 I have a data frame that contains a column with multiple values consisting of gene name synonyms separated by semicolons: score <- c("32.01","19.5","18.0") symbol <- c("30 kDa adipocyte complemen related protein","AAT1","Cachectin") synonym <- c("30 kDa adipocyte complemen related protein; 30 kDa adipocyte complement-related protein; ACDC; ACRP30; ADIPOQ; APM-1; APM1; Adipocyte C1Q and collagen domain containing","AAT1; AAT1; ALT-1; ALT1; Alanine aminotransferase; Alanine aminotransferase 1;

Do a string split for more than one row in MATLAB

北城余情 提交于 2019-12-24 19:18:57
问题 I have written a for loop in which to split 5000 rows accordingly along each of the columns that they are in. Example of the cell array that contains those rows: From that picture, i would like to split each row accordingly along their respective columns of that row starting from the first column to the end. This is the code that i have written: for i = pdbindex(:,1) clean_pdb = regexprep(pdbindex, ':', ' '); % removes the colon (:) from the array and replaces it with a whitespace pdb2char =