How do you summarize columns based on unique IDs without knowing IDs in R?

一笑奈何 提交于 2019-12-01 02:01:00

If you just want to have a count for the whole dataframe, you can use table(unlist(df)) (see also @goctlr's answer) & if you also want to have probabilities: prop.table(table(unlist(df))). When you also want to get the count for the individual columns, it gets more difficult.

To get the count for each column and the total count, I wrote the following function:

# some reproducible data:
set.seed(1)
x <- sample(letters[1:4], 20, replace = TRUE)
y <- sample(letters[1:5], 20, replace = TRUE)
z <- sample(letters[1:6], 20, replace = TRUE)
df <- data.frame(x,y,z)

# the function
func <- function(x) {
  x2 <- data.frame()
  nms <- names(x)
  id <- sort(unique(unlist(x)))
  for(i in 1:length(id)) {
    for(j in 1:length(nms)) {
      x2[i,j] <- sum(x[,j] %in% id[i])
    }
  }
  names(x2) <- nms
  x2$total <- rowSums(x2)
  x2 <- cbind(id,x2)
  assign("dat", x2, envir = .GlobalEnv)
}

Executing the function with func(df) will give you a dataframe dat in your global envirenment:

> dat
  id x y z total
1  a 4 4 3    11
2  b 5 5 2    12
3  c 5 4 4    13
4  d 6 4 5    15
5  e 0 3 5     8
6  f 0 0 1     1

After that, you can calculate the percentages with for example the dplyr package:

library(dplyr)
dat <- dat %>% mutate(xperc=round(100*x/sum(total),1),
                      yperc=round(100*y/sum(total),1),
                      zperc=round(100*z/sum(total),1),
                      perc=round(100*total/sum(total),1))

which results in:

> dat
  id x y z total xperc yperc zperc perc
1  a 4 4 3    11   6.7   6.7   5.0 18.3
2  b 5 5 2    12   8.3   8.3   3.3 20.0
3  c 5 4 4    13   8.3   6.7   6.7 21.7
4  d 6 4 5    15  10.0   6.7   8.3 25.0
5  e 0 3 5     8   0.0   5.0   8.3 13.3
6  f 0 0 1     1   0.0   0.0   1.7  1.7

For a summary of counts for the whole data frame you can unlist the data frame and then call the table function:

table(unlist(df))

To get the percentage of total counts, save the result and use the prop.table function:

tout <- table(unlist(df))
prop.table(tout)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!