Extend contigency table with proportions (percentages)

后端 未结 6 1200
悲哀的现实
悲哀的现实 2020-11-29 02:39

I have a contingency table of counts, and I want to extend it with corresponding proportions of each group.

Some sample data (tips data set from gg

相关标签:
6条回答
  • 2020-11-29 02:49

    I am not 100% certain, but I think this does what you want using prop.table. See mostly the last 3 lines. The rest of the code is just creating fake data.

    set.seed(1234)
    
    total_bill <- rnorm(50, 25, 3)
    tip <- 0.15 * total_bill + rnorm(50, 0, 1)
    sex <- rbinom(50, 1, 0.5)
    smoker <- rbinom(50, 1, 0.3)
    day <- ceiling(runif(50, 0,7))
    time <- ceiling(runif(50, 0,3))
    size <- 1 + rpois(50, 2)
    my.data <- as.data.frame(cbind(total_bill, tip, sex, smoker, day, time, size))
    my.data
    
    my.table <- table(my.data$smoker)
    
    my.prop <- prop.table(my.table)
    
    cbind(my.table, my.prop)
    
    0 讨论(0)
  • 2020-11-29 02:53

    Here is another example using the lapply and table functions in base R.

    freqList = lapply(select_if(tips, is.factor), 
                  function(x) {
                      df = data.frame(table(x))
    
                      df = data.frame(fct = df[, 1], 
                                      n = sapply(df[, 2], function(y) {
                                          round(y / nrow(dat), 2)
                                        }
                                    )
                                )
                      return(df) 
                        }
                    )
    

    Use print(freqList) to see the proportion tables (percent of frequencies) for each column/feature/variable (depending on your tradecraft) that is labeled as a factor.

    0 讨论(0)
  • 2020-11-29 03:03

    If it's conciseness you're after, you might like:

    prop.table(table(tips$smoker))
    

    and then scale by 100 and round if you like. Or more like your exact output:

    tbl <- table(tips$smoker)
    cbind(tbl,prop.table(tbl))
    

    If you wanted to do this for multiple columns, there are lots of different directions you could go depending on what your tastes tell you is clean looking output, but here's one option:

    tblFun <- function(x){
        tbl <- table(x)
        res <- cbind(tbl,round(prop.table(tbl)*100,2))
        colnames(res) <- c('Count','Percentage')
        res
    }
    
    do.call(rbind,lapply(tips[3:6],tblFun))
           Count Percentage
    Female    87      35.66
    Male     157      64.34
    No       151      61.89
    Yes       93      38.11
    Fri       19       7.79
    Sat       87      35.66
    Sun       76      31.15
    Thur      62      25.41
    Dinner   176      72.13
    Lunch     68      27.87
    

    If you don't like stack the different tables on top of each other, you can ditch the do.call and leave them in a list.

    0 讨论(0)
  • 2020-11-29 03:04

    I made this for when doing aggregate functions and similar

    per.fun <- function(x) {
        if(length(x)>1){
            denom <- length(x);
            num <- sum(x);
            percentage <- num/denom;
            percentage*100
            }
            else NA
        }
    
    0 讨论(0)
  • 2020-11-29 03:05

    Here's a tidyverse version:

    library(tidyverse)
    data(diamonds)
    
    (as.data.frame(table(diamonds$cut)) %>% rename(Count=1,Freq=2) %>% mutate(Perc=100*Freq/sum(Freq)))
    

    Or if you want a handy function:

    getPercentages <- function(df, colName) {
      df.cnt <- df %>% select({{colName}}) %>% 
        table() %>%
        as.data.frame() %>% 
        rename({{colName}} :=1, Freq=2) %>% 
        mutate(Perc=100*Freq/sum(Freq))
    }
    

    Now you can do:

    diamonds %>% getPercentages(cut)
    

    or this:

    df=diamonds %>% group_by(cut) %>% group_modify(~.x %>% getPercentages(clarity))
    ggplot(df,aes(x=clarity,y=Perc))+geom_col()+facet_wrap(~cut)
    
    0 讨论(0)
  • 2020-11-29 03:06

    Your code doesn't seem so ugly to me...
    however, an alternative (not much better) could be e.g. :

    df <- data.frame(table(yn))
    colnames(df) <- c('Smoker','Freq')
    df$Perc <- df$Freq / sum(df$Freq) * 100
    
    ------------------
      Smoker Freq Perc
    1     No   19 47.5
    2    Yes   21 52.5
    
    0 讨论(0)
提交回复
热议问题