问题
I have a large data frame with columns that are a character string of numbers such as "1, 2, 3, 4". I wish to add a new column that is the average of these numbers. I have set up the following example:
set.seed(2015)
library(dplyr)
a<-c("1, 2, 3, 4", "2, 4, 6, 8", "3, 6, 9, 12")
df<-data.frame(a)
df$a <- as.character(df$a)
Now I can use strsplit to split the string and return the mean for a given row where the [[1]] specifies the first row.
mean(as.numeric(strsplit((df$a), split=", ")[[1]]))
[1] 2.5
The problem is when I try to do this in a data frame and reference the row number I get an error.
> df2<- df %>%
+ mutate(index = row_number(),
+ avg = mean(as.numeric(strsplit((df$a), split=", ")
[[index]])))
Error in strsplit((df$a), split = ", ")[[1:3]] :
recursive indexing failed at level 2
Can anyone explain this error and why I cannot index using a variable? If I replace index with a constant it works, it seems to not like me using a variable there.
Much thanks!
回答1:
You could use sapply
to loop through the list returned by strsplit
, handling each of the list elements:
sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5
回答2:
Try:
library(dplyr)
library(splitstackshape)
df %>%
mutate(index = row_number()) %>%
cSplit("a", direction = "long") %>%
group_by(index) %>%
summarise(mean = mean(a))
Which gives:
#Source: local data table [3 x 2]
#
# index mean
#1 1 2.5
#2 2 5.0
#3 3 7.5
Or as per @Ananda's suggestion:
> rowMeans(cSplit(df, "a"), na.rm = T)
# [1] 2.5 5.0 7.5
If you want to keep the result in a data frame you could do:
df %>% mutate(mean = rowMeans(cSplit(., "a"), na.rm = T))
Which gives:
# a mean
#1 1, 2, 3, 4 2.5
#2 2, 4, 6, 8 5.0
#3 3, 6, 9, 12 7.5
回答3:
library(data.table)
cols <- paste0("a",1:4)
setDT(df)[, (cols) := tstrsplit(a, ",", fixed=TRUE, type.convert=TRUE)
][, .(Mean = rowMeans(.SD)), .SDcols = cols]
Mean
1: 2.5
2: 5.0
3: 7.5
Alternatively,
rowMeans(setDT(tstrsplit(df$a, ",", fixed=TRUE, type.convert=TRUE)))
# [1] 2.5 5.0 7.5
来源:https://stackoverflow.com/questions/30857740/r-split-string-into-numeric-and-return-the-mean-as-a-new-column-in-a-data-frame