I would like to return the count of the unique values for every column in a table. For example, if I have the table:
Testdata <- data.frame(var_1 = c(\"a
I just tried all solution and two of the above solutions did not work one with aggregate and the tidyr ones but two of them using did not work. I think using a data table is a good choice ,
setDT(Testdata)[, lapply(.SD, uniqueN), .SDcols=c("var_1","var_2","var_3")]
# var_1 var_2 var_3
# 1: 1 1 3
I tried to compare them from each other
library(microbenchmark)
Mycomp = microbenchmark(
apply = apply(Testdata, 2, function(x)length(unique(x))),
lapply = lapply(Testdata, function(x)length(unique(x))),
sapply = sapply(Testdata, function(x)length(unique(x))),
#base = aggregate(values ~ ind, unique(stack(Testdata)), length),
datatable = setDT(Testdata)[, lapply(.SD, uniqueN), .SDcols=c("var_1","var_2","var_3")],
times=50
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# apply 163.315 176.678 192.0435 181.7915 192.047 608.859 50 b
# lapply 138.217 147.339 157.9684 153.0640 165.829 254.145 50 a
# sapply 160.338 169.124 178.1486 174.3965 185.548 203.419 50 b
# datatable 667.937 684.650 698.1306 696.0160 703.390 874.073 50 c