sum of rows when condition is met- data.frame in R

空扰寡人 提交于 2020-06-29 03:40:44

问题


df
  var1 var2
1    a    1
2    b    2
3    a    3
4    c    6
5    d   88
6    b    0

df2 <- data.frame(var1=c("k","b","a","k","k","b"),var2=c(14,78,5,6,88,0))
> list <- list(df,df2)

for(i in list){
   if(any(i[ ,1] == i[ ,1})){
      cumsum(.)
   }
}

I have a list containing of data.frames. I want to iterate over these data.frames. When there is the same letter in the first column, then the sum should be calculated. I want this new row to be in my data.frame. I completely messed up the if statement. Can somebody help me please?

EDIT: the result should look like

df
  var1 var2
1 a    4
2 b    2
3 c    6
4 d    88

and for df2

var1 var2
1    k   108
2    b   78
3    a    5

In my real problem, the list consists of 10 data.frames, not just two


回答1:


in Base-R

sapply(split(df$var2,df$var1), sum)

 a  b  c  d 
 4  2  6 88 

or to do it on each element of a list of dataframes

lapply(list, function(x) sapply(split(x$var2,x$var1), sum))

[[1]]
 a  b  c  d 
 4  2  6 88 

[[2]]
  a   b   k 
  5  78 108 



回答2:


Something like?

df <- tibble::tribble(
~var1, ~var2,
    "a",    1,
    "b",    2,
    "a",    3,
    "c",    6,
    "d",   88,
    "b",    0)

df2 <- data.frame(var1=c("k","b","a","k","k","b"),var2=c(14,78,5,6,88,0))

df3 <- cbind(df,df2)
colnames(df3) <- c("df1var1","df1var2","df2var1","df2var2")
df3 %>% mutate(sum = ifelse(df1var1 == df2var1, df1var2 + df2var2, NA))
  df1var1 df1var2 df2var1 df2var2 sum
1       a       1       k      14  NA
2       b       2       b      78  80
3       a       3       a       5   8
4       c       6       k       6  NA
5       d      88       k      88  NA
6       b       0       b       0   0



回答3:


It was a bit difficult to understand, but after you gave the result as it should like I think this is what you are looking for: group the df and then summarise

library(tidyverse)

df2 <- data.frame(var1=c("k","b","a","k","k","b"),var2=c(14,78,5,6,88,0))

df <- tibble::tribble(
  ~var1, ~var2,
  "a",    1,
  "b",    2,
  "a",    3,
  "c",    6,
  "d",   88,
  "b",    0
)


df %>% 
  group_by(var1) %>% 
  summarise(sum = sum(var2))
#> # A tibble: 4 x 2
#>   var1    sum
#>   <chr> <dbl>
#> 1 a         4
#> 2 b         2
#> 3 c         6
#> 4 d        88
df2 %>% 
  group_by(var1) %>% 
  summarise(sum = sum(var2))
#> # A tibble: 3 x 2
#>   var1    sum
#>   <chr> <dbl>
#> 1 a         5
#> 2 b        78
#> 3 k       108

Created on 2020-06-10 by the reprex package (v0.3.0)

and in base R you could do

aggregate(df$var2, by=list(df$var1), FUN=sum)[2]

It took me a while to understand that you want to do this starting with a list of data frames. In this case you could define your function in the tidyverse-way and apply purrr:map

dflist <- list(df, df2)
df_sum <- function(df){
  df %>% 
    as.data.frame() %>% 
    group_by(var1) %>% 
    summarise(sum = sum(var2))
}

purr::map(dflist,tt)


来源:https://stackoverflow.com/questions/62305258/sum-of-rows-when-condition-is-met-data-frame-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!