how to integrate properties defined on multiple rows using a data.frame or data.table long format approach

时光总嘲笑我的痴心妄想 提交于 2019-12-13 22:39:24

问题


I have been recently starting to use the data.table package in R. I find it super-convenient for transforming and aggregating data. One thing that I miss is how do you transform data that are defined on multiple rows? Do I need to reshape the data.frame/table in a wide format first?

Say you have the following data table:

dt=data.table(group=c("a","a","a","b","b","b"),
              subg=c("f1","f2","f3","f1","f2","f3"), 
              counts=c(3,4,5,8,9,10))

and for each group you want to calculate the relative frequency of each subgroup (c1/(c1+c2+c3)) and other properties as a function of c1, c2 ,c3 (c1, c2, c3 are the counts associated to f1, f2 and f3).

I can see how transform the data table in a wide format and then apply the transformation. Is there any way to calculate this directly in the long format (ideally using the data table)?

In general the group and subgroup could be represented by multiple factors.


回答1:


If I understand OP correctly, you want smth like this:

dt[, {bigN = .N; .SD[, .N / bigN, by = subg]}, by = group]

or maybe (and very similarly) this:

dt[, {counts.sum = sum(counts); .SD[, counts / counts.sum, by = subg]},
     by = group]



回答2:


If you are using the data.frame, you can use ddply from plyr package (two-step approach):

dt1<-ddply(dt,.(group),transform, gcount=sum(counts))# gcount=sum of count for each group
>dt1
group subg counts gcount
1     a   f1      3     12
2     a   f2      4     12
3     a   f3      5     12
4     b   f1      8     27
5     b   f2      9     27
6     b   f3     10     27

dt2<-ddply(dt1,.(group,subg),transform,rel.count=counts/gcount) #rel.count=relative frequency
>dt2
  group subg counts gcount rel.count
1     a   f1      3     12 0.2500000
2     a   f2      4     12 0.3333333
3     a   f3      5     12 0.4166667
4     b   f1      8     27 0.2962963
5     b   f2      9     27 0.3333333
6     b   f3     10     27 0.3703704


来源:https://stackoverflow.com/questions/18110933/how-to-integrate-properties-defined-on-multiple-rows-using-a-data-frame-or-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!