strsplit

How should I split and retain elements using strsplit?

风流意气都作罢 提交于 2019-11-29 00:59:18
问题 What a strsplit function in R does is, match and delete a given regular expression to split the rest of the string into vectors. >strsplit("abc123def", "[0-9]+") [[1]] [1] "abc" "" "" "def" But how should I split the string the same way using regular expression, but also retain the matches? I need something like the following. >FUNCTION("abc123def", "[0-9]+") [[1]] [1] "abc" "123" "def" Using strapply("abc123def", "[0-9]+|[a-z]+") works here, but what if the rest of the string other than the

removing particular character in a column in r

我是研究僧i 提交于 2019-11-28 12:49:09
I have a table called LOAN containing column named RATE in which the observations are given in percentage for example 14.49% how can i format the table so that all value in rate are edited and % is removed from the entries so that i can use plot function on it .I tried using strsplit. strsplit(LOAN$RATE,"%") but got error non character argument Items that appear to be character when printed but for which R thinks otherwise are generally factor classes objects. I'm also guessing htat you are not going to be happy with the list output that strsplit will return Try: gsub( "%", "", as.character

Remove everything after a string in a data frame column with missing values

走远了吗. 提交于 2019-11-28 05:55:45
问题 I have a data frame resembling the extract below: Observation Identifier Value Obs001 ABC_2001 54 Obs002 ABC_2002 -2 Obs003 1 Obs004 1 Obs005 Def_2001/05 I would like to transform this data frame into a data frame where portions of the string after the "_" sign would be removed: as illustrated below: Observation Identifier_NoTime Value Obs001 ABC 54 Obs002 ABC -2 Obs003 1 Obs004 1 Obs005 Def I tried experimenting with strsplit , gsub and sub as discussed here but cannot force those commends

Regex; eliminate all punctuation except

我的梦境 提交于 2019-11-28 01:09:34
I have the following regex that splits on any space or punctuation. How can I exclude 1 or more punctuation characters from :punct: ? Let's say I'd like to exclude apostrophes and commas. I know I could explicitly use [all punctuation marks in here] instead of [[:punct:]] but I'm hoping for an exclusion method. X <- "I'm not that good at regex yet, but am getting better!" strsplit(X, "[[:space:]]|(?=[[:punct:]])", perl=TRUE) [1] "I" "'" "m" "not" "that" "good" "at" "regex" "yet" [10] "," "" "but" "am" "getting" "better" "!" Joshua Ulrich It's not clear to me what you want the result to be, but

Splitting text column into ragged multiple new columns in a data table in R

末鹿安然 提交于 2019-11-27 23:15:19
I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word: Data [ , Word1 := as.character(lapply(strsplit(as.character(Data$complaint), split=" "), "[", 1))] ( Data is my data table and complaint is the name of the column) Obviously, this is not efficient because each cell in each row has different number of words. Could you please tell me about a more efficient way to do this? Check out cSplit from my "splitstackshape" package. It works

R: I have to do Softmatch in String

泪湿孤枕 提交于 2019-11-27 19:01:55
问题 I have to do softmatch in one column of data frame with the given input string, like col <- c("John Collingson","J Collingson","Dummy Name1","Dummy Name2") inputText <- "J Collingson" #Vice-Versa inputText <- "John Collingson" I want to retrieve both "John Collingson" & "J Collingson" from the provided colname "col" Kindly help 回答1: agrep is definitely a quick and easy base R solution if you have just a bit of data. If this is just a toy example of a larger data frame, you may be interested

error in strsplit when trying to separate by a comma

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-27 15:44:40
I have the vector length # [1] 15,34, 12,24, 225, # Levels: 12,24, 15,34, 225, and I want to separate them by the comma to eventually make a list of these values Tried: strsplit(length, ",") but keep getting the error message Error in strsplit(length, ",") : non-character argument Your "length" object is a factor : As the error message indicates, strsplit expects a character vector as the input. Try: strsplit(as.character(length), ",") Demo x <- factor(c("1,2", "3,4", "5,6")) strsplit(x, ",") # Error in strsplit(x, ",") : non-character argument strsplit(as.character(x), ",") # [[1]] # [1] "1"

Split a string by any number of spaces

不打扰是莪最后的温柔 提交于 2019-11-27 13:32:04
I have the following string: [1] "10012 ---- ---- ---- ---- CAB UNCH CAB" I want to split this string by the gaps, but the gaps have a variable number of spaces. Is there a way to use strsplit() function to split this string and return a vector of 8 elements that has removed all of the gaps? One line of code is preferred. Just use strsplit with \\s+ to split on: x <- "10012 ---- ---- ---- ---- CAB UNCH CAB" x # [1] "10012 ---- ---- ---- ---- CAB UNCH CAB" strsplit(x, "\\s+")[[1]] # [1] "10012" "----" "----" "----" "----" "CAB" "UNCH" "CAB" length(.Last.value) # [1] 8 Or, in this case, scan

How to avoid a loop in R: selecting items from a list

99封情书 提交于 2019-11-27 10:34:27
I could solve this using loops, but I am trying think in vectors so my code will be more R-esque. I have a list of names. The format is firstname_lastname. I want to get out of this list a separate list with only the first names. I can't seem to get my mind around how to do this. Here's some example data: t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan") tsplit <- strsplit(t,"_") which looks like this: > tsplit [[1]] [1] "bob" "smith" [[2]] [1] "mary" "jane" [[3]] [1] "jose" "chung" [[4]] [1] "michael" "marx" [[5]] [1] "charlie" "ivan" I could get out what I want using

Extract a string between patterns/delimiters in R

无人久伴 提交于 2019-11-27 08:25:13
问题 I have variable names in the form: PP_Sample_12.GT or PP_Sample-17.GT I'm trying to use string split to grep out the middle section: ie Sample_12 or Sample-17 . However, when I do: IDtmp <- sapply(strsplit(names(df[c(1:13)]),'_'),function(x) x[2]) IDs <- data.frame(sapply(strsplit(IDtmp,'.GT',fixed=T),function(x) x[1])) I end up with Sample for PP_Sample_12.GT . Is there another way to do this? Maybe using a pattern/replace kind of function ? Though, not sure if this exists in R (but I think