Adding columns sums in dataframe row wise conditional on a dummy

不想你离开。 提交于 2019-12-13 01:27:51

问题


I would like to add the sums of the columns of my dataframe one row at a time, conditional on another column that has a binary variable.

So for each row, I would like to compute the sum of the entire column above it for all rows where the binary variable in the corresponding row has the same value.

Here is an example:

dummy var1  var2
1     x1     y1
0     x2     y2
0     x3     y3
1     x4     y4

My goal is to obtain this:

dummy var1     var2
1     x1       y1
0     x2       y2
0     x3+x2    y3+y2
1     x4+x1    y4+y1

I have asked this question previously for a simplified version (Adding columns sums in dataframe row wise) where I just add all of the values above without the condition. Is there a way to incorporate this condition?


回答1:


data.table::rleid will give you the grouping you want. If you convert your data frame to a data.table, it's like this:

(Note: this assumes that your text is accurate and your example incorrect: it groups by consecutive equal values in the dummy column.)

library(data.table)
setDT(your_data)
your_data[, id := rleid(dummy)][
  , c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = id
]

If you need to do this to a bunch of columns, set the id as above, define your vector of columns, and then:

cols = c("var1", "var2", "var3", ...)
your_data[, (cols) := lapply(.SD, cumsum), by = id, .SD = cols]

If you just want to group by the dummy column, ignoring consecutiveness, then your question is an exact duplicate of this one, and you can do it like this:

setDT(your_data)
your_data[, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = dummy]



回答2:


You can use Reduce:

fun=function(x)Reduce(function(x,y)paste0(y,"+",x),x,accumulate = T)
sapply(dat[-1],function(x)ave(x,dat[,1],FUN = fun))
     var1    var2   
[1,] "x1"    "y1"   
[2,] "x2"    "y2"   
[3,] "x3+x2" "y3+y2"
[4,] "x4+x1" "y4+y1"

If these were just values then you could do :

#Example data
dat2=data.frame(dummy=dat[,1],var1=c(1,2,10,20),var2=c(10,20,50,3))

What to use:

sapply(dat2[-1],function(x)ave(x,dat2[,1],FUN=cumsum))
     var1 var2
[1,]    1   10
[2,]    2   20
[3,]   12   70
[4,]   21   13



回答3:


Some good answers here already. This is a solution using dplyr:

data.frame(dummy = c(1L,0L,0L,1L), var1 = c(1L,2L,4L,6L), var2 = c(100L,20L,30L,400L)) %>%
    group_by(dummy) %>%
    mutate_all(funs(cumsum))

# A tibble: 4 x 3
# Groups:   dummy [2]
  dummy  var1  var2
  <dbl> <dbl> <dbl>
1  1.00  1.00 100  
2  0     2.00  20.0
3  0     6.00  50.0
4  1.00  7.00 500  



回答4:


Well I don't think you could do this using a simple function, at least not from my experience. So I suggest writing a function as follows:

sum_new_df  <- function(df){
    new_df <- df[,-1]
    for (i in 1:nrow(df)){
        for (j in (i+1):nrow(df)){
            if (df$dummy[i] == df$dummy[j]){
                new_df[j,] <- df[,-1][j,] + df[,-1][j,]
            }    
        }
    }
}

This function could only sum up the row value of same dummy by increasing row number. So if that is a large data.frame, then the value would be like pyramid.



来源:https://stackoverflow.com/questions/48603902/adding-columns-sums-in-dataframe-row-wise-conditional-on-a-dummy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!