I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier
If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!
Here I've generated data of your dimensions (26746 * 18).
set.seed(45)
DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))
data.table answer:require(data.table)
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
# data.table solution
system.time({
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
})
# user system elapsed
# 0.174 0.029 0.227
# apply solution from @thelatemail
system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
# user system elapsed
# 2.322 0.036 2.602
identical(t1, t2)
# [1] TRUE
It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid)), rowid, mult="last"]