Summing across rows of a data.table for specific columns

后端未结

关注

 2  1585

I have a large data table (from the package data.table) with over 60 columns (the first three corresponding to factors and the remaining to response variables, in this case

相关标签:

2条回答

天涯浪人

2020-12-15 07:44
An alternative (data.table) approach would be to store your data in long form. Version 1.8.11 of data.table has fast melt and dcast methods
```
library(reshape2)
mt <- melt(test, id=1:3,variable.name='Species')

abundance <- mt[,list(abundance = mean(value)),by=list(Zone,quadrat,Species)][, 
                sumAbundance := sum(abundance), by = list(Zone,quadrat)]
```
Working in long format will take a slight change in thinking, but it may end up being more efficient memory wise (as less internal copying will be involved, and you are referencing a single not multiple elements within every "by" group.)
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-12-15 08:00
[ Edited 2020-02-15 to reflect current state of data.table ] In recent versions of data.table rowSums(Abundance[ , 4:6]) works as OP originally expected. Here are some alternatives:
```
Abundance[, SumAbundance := rowSums(.SD), .SDcols = 4:6]
```
Also, I didn't check, but I have a suspicion this will be faster, since it will not convert to matrix as rowSums does:
```
Abundance[, SumAbundance := Reduce(`+`, .SD), .SDcol = 4:6]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...