strsplit | 易学教程

Splitting text column into ragged multiple new columns in a data table in R

阅读更多关于 Splitting text column into ragged multiple new columns in a data table in R

问题 I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word: Data [ , Word1 := as.character(lapply(strsplit(as.character(Data$complaint), split=" "), "[", 1))] ( Data is my data table and complaint is the name of the column) Obviously, this is not efficient because each cell in each row has different number of words. Could you please tell

Split data.frame into groups by column name

阅读更多关于 Split data.frame into groups by column name

问题 I'm new to R. I have a data frame with column names of such type: file_001 file_002 block_001 block_002 red_001 red_002 ....etc' 0.05 0.2 0.4 0.006 0.05 0.3 0.01 0.87 0.56 0.4 0.12 0.06 I want to split them into groups by the column name, to get a result like this: group_file file_001 file_002 0.05 0.2 0.01 0.87 group_block block_001 block_002 0.4 0.006 0.56 0.4 group_red red_001 red_002 0.05 0.3 0.12 0.06 ...etc' My file is huge. I don't have a certain number of groups. It needs to be just

Regex; eliminate all punctuation except

阅读更多关于 Regex; eliminate all punctuation except

问题 I have the following regex that splits on any space or punctuation. How can I exclude 1 or more punctuation characters from :punct: ? Let's say I'd like to exclude apostrophes and commas. I know I could explicitly use [all punctuation marks in here] instead of [[:punct:]] but I'm hoping for an exclusion method. X <- "I'm not that good at regex yet, but am getting better!" strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE) [1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet" [10] "," ""

Split different lengths values and bind to columns

阅读更多关于 Split different lengths values and bind to columns

问题 I've got a rather large (around 100k observations) data set, similar to this: data <- data.frame( ID = seq(1, 5, 1), Values = c("1,2,3", "4", " ", "4,1,6,5,1,1,6", "0,0"), stringsAsFactors=F) data ID Values 1 1 1,2,3 2 2 4 3 3 4 4 4,1,6,5,1,1,6 5 5 0,0 I want to split the Values column by "," with NA for missed cells: ID v1 v2 v3 v4 v5 v6 v7 1 1 2 3 NA NA NA NA 2 4 NA NA NA NA NA NA 3 NA NA NA NA NA NA NA 4 4 1 6 5 1 1 6 5 0 0 NA NA NA NA NA ... Best attempt was strsplit + rbind : df <- data

error in strsplit when trying to separate by a comma

阅读更多关于 error in strsplit when trying to separate by a comma

问题 I have the vector length # [1] 15,34, 12,24, 225, # Levels: 12,24, 15,34, 225, and I want to separate them by the comma to eventually make a list of these values Tried: strsplit(length, ",") but keep getting the error message Error in strsplit(length, ",") : non-character argument 回答1: Your "length" object is a factor : As the error message indicates, strsplit expects a character vector as the input. Try: strsplit(as.character(length), ",") Demo x <- factor(c("1,2", "3,4", "5,6")) strsplit(x,

How to avoid a loop in R: selecting items from a list

阅读更多关于 How to avoid a loop in R: selecting items from a list

问题 I could solve this using loops, but I am trying think in vectors so my code will be more R-esque. I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to get my mind around how to do this. Here's some example data: t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan") tsplit <- strsplit(t,"_") which looks like this: > tsplit [[1]] [1] "bob" "smith" [[2]] [1] "mary" "jane" [[3]] [1]

Why does strsplit use positive lookahead and lookbehind assertion matches differently?

阅读更多关于 Why does strsplit use positive lookahead and lookbehind assertion matches differently?

Common sense and a sanity-check using gregexpr() indicate that the look-behind and look-ahead assertions below should each match at exactly one location in testString : testString <- "text XX text" BB <- "(?<= XX )" FF <- "(?= XX )" as.vector(gregexpr(BB, testString, perl=TRUE)[[1]]) # [1] 9 as.vector(gregexpr(FF, testString, perl=TRUE)[[1]][1]) # [1] 5 strsplit() , however, uses those match locations differently, splitting testString at one location when using the lookbehind assertion, but at two locations -- the second of which seems incorrect -- when using the lookahead assertion. strsplit

How to use the strsplit function with a period

阅读更多关于 How to use the strsplit function with a period

问题 I would like to split the following string by its periods. I tried strsplit() with \".\" in the split argument, but did not get the result I want. s <- \"I.want.to.split\" strsplit(s, \".\") [[1]] [1] \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" \"\" The output I want is to split s into 4 elements in a list, as follows. [[1]] [1] \"I\" \"want\" \"to\" \"split\" What should I do? 回答1: When using a regular expression in the split argument of strsplit() , you've got to

Splitting a string into new rows in R [duplicate]

阅读更多关于 Splitting a string into new rows in R [duplicate]

问题 This question already has answers here : Split comma-separated strings in a column into separate rows (5 answers) Closed 2 years ago . I have a data set like below: Country Region Molecule Item Code IND NA PB102 FR206985511 THAI AP PB103 BA-107603 / F000113361 / 107603 LUXE NA PB105 1012701 / SGP-1012701 / F041701000 IND AP PB106 AU206985211 / CA-F206985211 THAI HP PB107 F034702000 / 1010701 / SGP-1010701 BANG NA PB108 F000007970/25781/20009021 I want to split based the string values in

Split delimited strings in a column and insert as new rows [duplicate]

阅读更多关于 Split delimited strings in a column and insert as new rows [duplicate]

问题 This question already has answers here : Split comma-separated strings in a column into separate rows (5 answers) Closed 3 years ago . I have a data frame as follow: +-----+-------+ | V1 | V2 | +-----+-------+ | 1 | a,b,c | | 2 | a,c | | 3 | b,d | | 4 | e,f | | . | . | +-----+-------+ Each of the alphabet is a character separated by comma. I would like to split V2 on each comma and insert the split strings as new rows. For instance, the desired output will be: +----+----+ | V1 | V2 | +----+--