问题
Let's say I have the following data.table
> DT
# A B C D E N
# 1: J t X D N 0.07898388
# 2: U z U L A 0.46906049
# 3: H a Z F S 0.50826435
# ---
# 9998: X b R L X 0.49879990
# 9999: Z r U J J 0.63233668
# 10000: C b M K U 0.47796539
Now I need to group by a pair of columns and calculate sum N. That's easy to do when you know column names in advance:
> DT[, sum(N), by=.(A,B)]
# A B V1
# 1: J t 6.556897
# 2: U z 9.060844
# 3: H a 4.293426
# ---
# 674: V z 11.439100
# 675: M x 1.736050
# 676: U k 3.676197
But I must do that in a function, which receives a vector of column indices to group by.
> f <- function(columns = 1:2) {
DT[, sum(N), by=columns]
}
> f(1:2)
Error in `[.data.table`(DT, , sum(N), by = columns) :
The items in the 'by' or 'keyby' list are length (2). Each must be same
length as rows in x or number of rows returned by i (10000).
I also tried:
> f(list("A", "B"))
Error in `[.data.table`(DT, , sum(N), by = list(columns)) :
column or expression 1 of 'by' or 'keyby' is type list. Do not quote column
names. Usage: DT[,sum(colC),by=list(colA,month(colB))]
How do I make this to work?
回答1:
Here's how I would approach this:
f <- function(columns) {
Get <- if (!is.numeric(columns)) match(columns, names(DT)) else columns
columns <- names(DT)[Get]
DT[, sum(N), by = columns]
}
The first line (Get..) keeps "columns" as numeric if it's already numeric or it converts it from characters to numeric if they are not.
Test it out with some sample data:
set.seed(1)
DT <- data.table(
A = sample(letters[1:3], 20, TRUE),
B = sample(letters[1:5], 20, TRUE),
C = sample(LETTERS[1:2], 20, TRUE),
N = rnorm(20)
)
## Should work with either column number or name
f(1)
f("A")
f(c(1, 3))
f(c("A", "C"))
来源:https://stackoverflow.com/questions/26999291/r-data-table-group-by-column-numbers-and-sum-a-column