For each row return the column name of the largest value

前端未结

关注

 8  2417

礼貌的吻别 2020-11-21 07:06

I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier

8条回答

轮回少年 (楼主)

2020-11-21 07:42

If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)
DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)
DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution
system.time({
DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
})
#   user  system elapsed 
#  0.174   0.029   0.227 

# apply solution from @thelatemail
system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
#   user  system elapsed 
#  2.322   0.036   2.602 

identical(t1, t2)
# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid)), rowid, mult="last"]

0 讨论(0)

查看其它8个回答

For each row return the column name of the largest value

Data

data.table answer:

Benchmarking:

Edit: if any of the max ids is okay, then:

`data.table` answer: