I have a large data frame with columns that are a character string of numbers such as \"1, 2, 3, 4\". I wish to add a new column that is the average of these numbers. I have
Try:
library(dplyr)
library(splitstackshape)
df %>%
mutate(index = row_number()) %>%
cSplit("a", direction = "long") %>%
group_by(index) %>%
summarise(mean = mean(a))
Which gives:
#Source: local data table [3 x 2]
#
# index mean
#1 1 2.5
#2 2 5.0
#3 3 7.5
Or as per @Ananda's suggestion:
> rowMeans(cSplit(df, "a"), na.rm = T)
# [1] 2.5 5.0 7.5
If you want to keep the result in a data frame you could do:
df %>% mutate(mean = rowMeans(cSplit(., "a"), na.rm = T))
Which gives:
# a mean
#1 1, 2, 3, 4 2.5
#2 2, 4, 6, 8 5.0
#3 3, 6, 9, 12 7.5
You could use sapply
to loop through the list returned by strsplit
, handling each of the list elements:
sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5
library(data.table)
cols <- paste0("a",1:4)
setDT(df)[, (cols) := tstrsplit(a, ",", fixed=TRUE, type.convert=TRUE)
][, .(Mean = rowMeans(.SD)), .SDcols = cols]
Mean
1: 2.5
2: 5.0
3: 7.5
Alternatively,
rowMeans(setDT(tstrsplit(df$a, ",", fixed=TRUE, type.convert=TRUE)))
# [1] 2.5 5.0 7.5