how to perform sum and count on dataframe in R

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-15 03:58:08

问题


I have a dataframe with ID, type, and the area I would like to perform two operations at once

ID         Type         Area     
1           Aa          0.02    
2           Ag          0.12    
2           Ag          0.14    
2           Ag          0.80    
2           Bm          0.20    
2           Xm          0.13    

The expected outcome is

ID          Type       count      area     
1           Aa           1        0.02    
2           Ag           3        1.06 (sum)    
2           Bm           1        0.20    
2           Xm           1        0.13 

I have up to 100-150 ID and type, count and basal area varies for each type with the same ID, what would be the best approach to perform sum and count same time and keep type and ID in dataframe?

Thanks


回答1:


1) Base R -- aggregate Counts are just the sum of a constant column of ones so using DF shown reproducibly in the Note at the end we add such a column and aggregate using sum. No packages are used.

aggregate(cbind(Count, Area) ~ ID + Type, transform(DF, Count = 1), sum)

giving:

  ID Type Count Area
1  1   Aa     1 0.02
2  2   Ag     3 1.06
3  2   Bm     1 0.20
4  2   Xm     1 0.13

2) Base R -- by An approach using only base R that does not rely on the trick of adding a column of ones is to use by. The by call produces a list of class by and the do.call("rbind", ...) converts that to a data frame.

do.call("rbind", by(DF, DF[1:2], with, 
  data.frame(ID = ID[1], Type = Type[1], Count = length(ID), Area = sum(Area))))

giving:

  ID Type Count Area
1  1   Aa     1 0.02
2  2   Ag     3 1.06
3  2   Bm     1 0.20
4  2   Xm     1 0.13

3) sqldf SQL allows the separate and simultaneous application of count and sum.

library(sqldf)
sqldf("select ID, Type, count(*) as Count, sum(Area) as Area
  from DF
  group by 1, 2")

giving:

  ID Type Count Area
1  1   Aa     1 0.02
2  2   Ag     3 1.06
3  2   Bm     1 0.20
4  2   Xm     1 0.13

4) data.table The data.table package can also be used.

library(data.table)

DT <- as.data.table(DF)
DT[, .(Count = .N, Area = sum(Area)), by = "ID,Type"]

giving:

   ID Type Count Area
1:  1   Aa     1 0.02
2:  2   Ag     3 1.06
3:  2   Bm     1 0.20
4:  2   Xm     1 0.13

Note

Lines <- "ID         Type         Area     
1           Aa          0.02    
2           Ag          0.12    
2           Ag          0.14    
2           Ag          0.80    
2           Bm          0.20    
2           Xm          0.13 "

DF <- read.table(text = Lines, header = TRUE)



回答2:


We can use dplyr. Packages are used

library(dplyr)
df1 %>%
   group_by(ID, Type) %>%
   summarise(count = n(), Area = sum(Area))
# A tibble: 4 x 4
# Groups:   ID [2]
#     ID Type  count  Area
#  <int> <chr> <int> <dbl>
#1     1 Aa        1  0.02
#2     2 Ag        3  1.06
#3     2 Bm        1  0.2 
#4     2 Xm        1  0.13

or with by from base R - Note that base R includes some packages as well...

by(df1['Area'], df1[1:2], FUN = function(x) cbind(count = nrow(x), Area = sum(x)))

data

df1 <- structure(list(ID = c(1L, 2L, 2L, 2L, 2L, 2L), Type = c("Aa", 
"Ag", "Ag", "Ag", "Bm", "Xm"), Area = c(0.02, 0.12, 0.14, 0.8, 
0.2, 0.13)), class = "data.frame", row.names = c(NA, -6L))



回答3:


Another possibility in plyr:

library(plyr)
ddply(DF, .(ID,Type), summarize, Count=length(Area), Area=sum(Area))

#   ID Type Count  Area
# 1  1   Aa     1  0.02
# 2  2   Ag     3  1.06
# 3  2   Bm     1  0.20
# 4  2   Xm     1  0.13



回答4:


If your data is large, I recommend data.table:

library(data.table)
setDT(df)[, .(Area=sum(Area), Count=.N), .(ID, Type)]


来源:https://stackoverflow.com/questions/57328395/how-to-perform-sum-and-count-on-dataframe-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!