strsplit

in R: find all unique values in column separated by comma

北城以北 提交于 2021-02-13 17:27:04
问题 I have multiple observations of one species with different observers / groups of observers and want to create a list of all unique observers. My data look like this: data <- read.table(text="species observer 1 A,B 1 A,B 1 B,E 1 B,E 1 D,E,A,C,C 1 F" , header = TRUE, stringsAsFactors = FALSE) My output should return a list of all unique observers - so: A,B,C,E,F I tried to substring the data in column C using the following command but that only returns the unique combinations of observers. all

in R: find all unique values in column separated by comma

六月ゝ 毕业季﹏ 提交于 2021-02-13 17:23:14
问题 I have multiple observations of one species with different observers / groups of observers and want to create a list of all unique observers. My data look like this: data <- read.table(text="species observer 1 A,B 1 A,B 1 B,E 1 B,E 1 D,E,A,C,C 1 F" , header = TRUE, stringsAsFactors = FALSE) My output should return a list of all unique observers - so: A,B,C,E,F I tried to substring the data in column C using the following command but that only returns the unique combinations of observers. all

How to split strings into new rows while maintaining other columns in R [duplicate]

强颜欢笑 提交于 2020-02-04 02:13:12
问题 This question already has answers here : Split delimited strings in a column and insert as new rows [duplicate] (6 answers) Closed 2 months ago . I am wanting to split a character vector column into multiple rows (of the same dataframe), while maintaining other columns ( keep ) in this reproducible example: dat<-structure(list(ID = c("E87", "E42", "E39", "E16,E17,E18", "E760,E761,E762"), keep = 1:5), row.names = c(NA, 5L), class = "data.frame") > dat ID keep 1 E87 1 2 E42 2 3 E39 3 4 E16,E17

R: split string into numeric and return the mean as a new column in a data frame

烂漫一生 提交于 2020-01-29 18:57:56
问题 I have a large data frame with columns that are a character string of numbers such as "1, 2, 3, 4". I wish to add a new column that is the average of these numbers. I have set up the following example: set.seed(2015) library(dplyr) a<-c("1, 2, 3, 4", "2, 4, 6, 8", "3, 6, 9, 12") df<-data.frame(a) df$a <- as.character(df$a) Now I can use strsplit to split the string and return the mean for a given row where the [[1]] specifies the first row. mean(as.numeric(strsplit((df$a), split=", ")[[1]]))

R: split string into numeric and return the mean as a new column in a data frame

陌路散爱 提交于 2020-01-29 18:57:25
问题 I have a large data frame with columns that are a character string of numbers such as "1, 2, 3, 4". I wish to add a new column that is the average of these numbers. I have set up the following example: set.seed(2015) library(dplyr) a<-c("1, 2, 3, 4", "2, 4, 6, 8", "3, 6, 9, 12") df<-data.frame(a) df$a <- as.character(df$a) Now I can use strsplit to split the string and return the mean for a given row where the [[1]] specifies the first row. mean(as.numeric(strsplit((df$a), split=", ")[[1]]))

best way to manipulate strings in big data.table

痞子三分冷 提交于 2020-01-23 09:19:25
问题 I have a 67MM row data.table with people names and surname separated by spaces. I just need to create a new column for each word. Here is an small subset of the data: n <- structure(list(Subscription_Id = c("13.855.231.846.091.000", "11.156.048.529.090.800", "24.940.584.090.830", "242.753.039.111.124", "27.843.782.090.830", "13.773.513.145.090.800", "25.691.374.090.830", "12.236.174.155.090.900", "252.027.904.121.210", "11.136.991.054.110.100" ), Account_Desc = c("AGUAYO CARLA", "LEIVA

R: how to avoid strsplit hiccuping on empty vectors when splitting text

妖精的绣舞 提交于 2020-01-16 12:00:19
问题 Have a list of text- sections which are required to be split into sentences by: > textList <- list(sections=sections[(length(sections)-2):length(sections)]) > textList$sentences <- sapply(textList$sections, function(x) strsplit(as.character(x), "(?<=und/KON)\\s(?!\\S+/V)|(?<=oder/KON)\\s|(?<=/\\$[[:punct:]])\\s(?!dass/KOUS)(?!dann/ADV)(?!weil/KOUS)", perl=TRUE)) > sent <- textList$sentences The final goal is to add ID s to all sentences and arrange them together into a list of dataframes -

R: how to avoid strsplit hiccuping on empty vectors when splitting text

徘徊边缘 提交于 2020-01-16 11:59:10
问题 Have a list of text- sections which are required to be split into sentences by: > textList <- list(sections=sections[(length(sections)-2):length(sections)]) > textList$sentences <- sapply(textList$sections, function(x) strsplit(as.character(x), "(?<=und/KON)\\s(?!\\S+/V)|(?<=oder/KON)\\s|(?<=/\\$[[:punct:]])\\s(?!dass/KOUS)(?!dann/ADV)(?!weil/KOUS)", perl=TRUE)) > sent <- textList$sentences The final goal is to add ID s to all sentences and arrange them together into a list of dataframes -

Create new column with dplyr mutate and substring of existing column

大憨熊 提交于 2020-01-12 06:45:15
问题 I have a dataframe with a column of strings and want to extract substrings of those into a new column. Here is some sample code and data showing I want to take the string after the final underscore character in the id column in order to create a new_id column. The id column entry always has 2 underscore characters and it's always the final substring I would like. df = data.frame( id = I(c("abcd_123_ABC","abc_5234_NHYK")), x = c(1.0,2.0) ) require(dplyr) df = df %>% dplyr::mutate(new_id =