问题
What is the best way to do component-wise matrix addition if the number of matrices to be summed is not known in advance? More generally, is there a good way to perform matrix (or multi-dimensional array) operations in the context of data.table? I use data.table
for its efficiency at sorting and grouping data by several fixed variables, or categories, each comprising a different number of observations.
For example:
- Find the outer product of vector components given in each observation (row) of the data, returning a matrix for each row.
- Sum the resulting matrices component-wise over all rows of each grouping of data categories.
Here illustrated with 2x2 matrices and only one category:
library(data.table)
# example data, number of rows differs by category t
N <- 5
dt <- data.table(t = rep(c("a", "b"), each = 3, len = N),
x1 = rep(1:2, len = N), x2 = rep(3:5, len = N),
y1 = rep(1:3, len = N), y2 = rep(2:5, len = N))
setkey(dt, t)
> dt
t x1 x2 y1 y2
1: a 1 3 1 2
2: a 2 4 2 3
3: a 1 5 3 4
4: b 2 3 1 5
5: b 1 4 2 2
I attempted a function to compute matrix sum on outer product, %o%
mat_sum <- function(x1, x2, y1, y2){
x <- c(x1, x2) # x vector
y <- c(y1, y2) # y vector
xy <- x %o% y # outer product (i.e. 2x2 matrix)
sum(xy) # <<< THIS RETURNS A SINGLE VALUE, NOT WHAT I WANT.
}
which, of course, does not work because sum
adds up all the elements across the arrays.
I saw this answer using Reduce('+', .list)
but that seems to require already having a list
of all the matrices to be added. I haven't figured out how to do that within data.table
, so instead I've got a cumbersome work-around:
# extract each outer product component first...
mat_comps <- function(x1, x2, y1, y2){
x <- c(x1, x2) # x vector
y <- c(y1, y2) # y vector
xy <- x %o% y # outer product (i.e. 2x2 matrix)
xy11 <- xy[1,1]
xy21 <- xy[2,1]
xy12 <- xy[1,2]
xy22 <- xy[2,2]
return(c(xy11, xy21, xy12, xy22))
}
# ...then running this function on dt,
# taking extra step (making column 'n') to apply it row-by-row...
dt[, n := 1:nrow(dt)]
dt[, c("xy11", "xy21", "xy12", "xy22") := as.list(mat_comps(x1, x2, y1, y2)),
by = n]
# ...then sum them individually, now grouping by t
s <- dt[, list(s11 = sum(xy11),
s21 = sum(xy21),
s12 = sum(xy12),
s22 = sum(xy22)),
by = key(dt)]
> s
t s11 s21 s12 s22
1: a 8 26 12 38
2: b 4 11 12 23
and that gives the summed components, which can finally be converted back to matrices.
回答1:
In general, data.table
is designed to work with columns. The more you transform your problem to col-wise operations, the more you can get out of data.table
.
Here's an attempt at accomplishing this operation col-wise. Probably there are better ways. This is intended more as a template, to provide an idea on approaching the problem (even though I understand it may not be possible in all cases).
xcols <- grep("^x", names(dt))
ycols <- grep("^y", names(dt))
combs <- CJ(ycols, xcols)
len <- seq_len(nrow(combs))
cols = paste("V", len, sep="")
for (i in len) {
c1 = combs$V2[i]
c2 = combs$V1[i]
set(dt, i=NULL, j=cols[i], value = dt[[c1]] * dt[[c2]])
}
# t x1 x2 y1 y2 V1 V2 V3 V4
# 1: a 1 3 1 2 1 3 2 6
# 2: a 2 4 2 3 4 8 6 12
# 3: a 1 5 3 4 3 15 4 20
# 4: b 2 3 1 5 2 3 10 15
# 5: b 1 4 2 2 2 8 2 8
This basically applies the outer product col-wise. Now it's just a matter of aggregating it.
dt[, lapply(.SD, sum), by=t, .SDcols=cols]
# t V1 V2 V3 V4
# 1: a 8 26 12 38
# 2: b 4 11 12 23
HTH
Edit: Modified cols, c1, c2
a bit to get the output with the correct order for V2
and V3
.
回答2:
EDIT: For not only 2 elements in "x"s and "y"s, a modified function could be:
ff2 = function(x_ls, y_ls)
{
combs_ls = lapply(seq_along(x_ls[[1]]),
function(i) list(sapply(x_ls, "[[", i),
sapply(y_ls, "[[", i)))
rowSums(sapply(combs_ls, function(x) as.vector(do.call(outer, x))))
}
where, "x_ls" and "y_ls" are lists of the respective vectors.
Using it:
dt[, as.list(ff2(list(x1, x2), list(y1, y2))), by = t]
# t V1 V2 V3 V4
#1: a 8 26 12 38
#2: b 4 11 12 23
And on other "data.frames/tables":
set.seed(101)
DF = data.frame(group = rep(letters[1:3], c(4, 2, 3)),
x1 = sample(1:20, 9, T), x2 = sample(1:20, 9, T),
x3 = sample(1:20, 9, T), x4 = sample(1:20, 9, T),
y1 = sample(1:20, 9, T), y2 = sample(1:20, 9, T),
y3 = sample(1:20, 9, T), y4 = sample(1:20, 9, T))
DT = as.data.table(DF)
DT[, as.list(ff2(list(x1, x2, x3, x4),
list(y1, y2, y3, y4))), by = group]
# group V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
#1: a 338 661 457 378 551 616 652 468 460 773 536 519 416 766 442 532
#2: b 108 261 171 99 29 77 43 29 154 386 238 146 161 313 287 121
#3: c 345 351 432 293 401 421 425 475 492 558 621 502 510 408 479 492
I don't know, though, how would one in "data.table" not state explicitly which columns to use inside the function; i.e. how you could do the equivalent of:
do.call(rbind, lapply(split(DF[-1], DF$group),
function(x)
do.call(ff2, c(list(x[grep("^x", names(x))]),
list(x[grep("^y", names(x))])))))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
#a 338 661 457 378 551 616 652 468 460 773 536 519 416 766 442 532
#b 108 261 171 99 29 77 43 29 154 386 238 146 161 313 287 121
#c 345 351 432 293 401 421 425 475 492 558 621 502 510 408 479 492
OLD ANSWER:
Perhaps you could define your function like:
ff1 = function(x1, x2, y1, y2)
rowSums(sapply(seq_along(x1),
function(i) as.vector(c(x1[i], x2[i]) %o% c(y1[i], y2[i]))))
dt[, as.list(ff1(x1, x2, y1, y2)), by = list(t)]
# t V1 V2 V3 V4
#1: a 8 26 12 38
#2: b 4 11 12 23
来源:https://stackoverflow.com/questions/24295482/matrix-operations-and-component-wise-addition-using-data-table