问题
I would like to add the sums of the columns of my dataframe one row at a time, conditional on another column that has a binary variable.
So for each row, I would like to compute the sum of the entire column above it for all rows where the binary variable in the corresponding row has the same value.
Here is an example:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3 y3
1 x4 y4
My goal is to obtain this:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3+x2 y3+y2
1 x4+x1 y4+y1
I have asked this question previously for a simplified version (Adding columns sums in dataframe row wise) where I just add all of the values above without the condition. Is there a way to incorporate this condition?
回答1:
data.table::rleid will give you the grouping you want. If you convert your data frame to a data.table, it's like this:
(Note: this assumes that your text is accurate and your example incorrect: it groups by consecutive equal values in the dummy column.)
library(data.table)
setDT(your_data)
your_data[, id := rleid(dummy)][
, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = id
]
If you need to do this to a bunch of columns, set the id as above, define your vector of columns, and then:
cols = c("var1", "var2", "var3", ...)
your_data[, (cols) := lapply(.SD, cumsum), by = id, .SD = cols]
If you just want to group by the dummy column, ignoring consecutiveness, then your question is an exact duplicate of this one, and you can do it like this:
setDT(your_data)
your_data[, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = dummy]
回答2:
You can use Reduce:
fun=function(x)Reduce(function(x,y)paste0(y,"+",x),x,accumulate = T)
sapply(dat[-1],function(x)ave(x,dat[,1],FUN = fun))
var1 var2
[1,] "x1" "y1"
[2,] "x2" "y2"
[3,] "x3+x2" "y3+y2"
[4,] "x4+x1" "y4+y1"
If these were just values then you could do :
#Example data
dat2=data.frame(dummy=dat[,1],var1=c(1,2,10,20),var2=c(10,20,50,3))
What to use:
sapply(dat2[-1],function(x)ave(x,dat2[,1],FUN=cumsum))
var1 var2
[1,] 1 10
[2,] 2 20
[3,] 12 70
[4,] 21 13
回答3:
Some good answers here already. This is a solution using dplyr:
data.frame(dummy = c(1L,0L,0L,1L), var1 = c(1L,2L,4L,6L), var2 = c(100L,20L,30L,400L)) %>%
group_by(dummy) %>%
mutate_all(funs(cumsum))
# A tibble: 4 x 3
# Groups: dummy [2]
dummy var1 var2
<dbl> <dbl> <dbl>
1 1.00 1.00 100
2 0 2.00 20.0
3 0 6.00 50.0
4 1.00 7.00 500
回答4:
Well I don't think you could do this using a simple function, at least not from my experience. So I suggest writing a function as follows:
sum_new_df <- function(df){
new_df <- df[,-1]
for (i in 1:nrow(df)){
for (j in (i+1):nrow(df)){
if (df$dummy[i] == df$dummy[j]){
new_df[j,] <- df[,-1][j,] + df[,-1][j,]
}
}
}
}
This function could only sum up the row value of same dummy by increasing row number. So if that is a large data.frame, then the value would be like pyramid.
来源:https://stackoverflow.com/questions/48603902/adding-columns-sums-in-dataframe-row-wise-conditional-on-a-dummy