Create summary table of categorical variables of different lengths

一曲冷凌霜 提交于 2019-11-30 04:54:55

问题


In SPSS it is fairly easy to create a summary table of categorical variables using "Custom Tables":

How can I do this in R?

General and expandable solutions are preferred, and solutions using the Plyr and/or Reshape2 packages, because I am trying to learn those.

Example Data: (mtcars is in the R installation)

df <- colwise(function(x) as.factor(x) ) (mtcars[,8:11])

P.S.

Please note, my goal is to get everything in one table like in the picture. I have been strugling for many hours but my attempts have been so poor that posting the code probably won't add to the comprehensibility of the question.


回答1:


One way to get the output, but not the formatting:

library(plyr)
ldply(mtcars[,8:11],function(x) t(rbind(names(table(x)),table(x),paste0(prop.table(table(x))*100,"%"))))
    .id 1  2       3
1    vs 0 18  56.25%
2    vs 1 14  43.75%
3    am 0 19 59.375%
4    am 1 13 40.625%
5  gear 3 15 46.875%
6  gear 4 12   37.5%
7  gear 5  5 15.625%
8  carb 1  7 21.875%
9  carb 2 10  31.25%
10 carb 3  3  9.375%
11 carb 4 10  31.25%
12 carb 6  1  3.125%
13 carb 8  1  3.125%



回答2:


A base R solution using lapply() and do.call() with rbind() to stitch together the pieces:

x <- lapply(mtcars[, c("vs", "am", "gear", "carb")], table)

neat.table <- function(x, name){
  xx <- data.frame(x)
  names(xx) <- c("Value", "Count")
  xx$Fraction <- with(xx, Count/sum(Count))
  data.frame(Variable = name, xx)
}

do.call(rbind, lapply(seq_along(x), function(i)neat.table(x[i], names(x[i]))))

Results in:

   Variable Value Count Fraction
1        vs     0    18  0.56250
2        vs     1    14  0.43750
3        am     0    19  0.59375
4        am     1    13  0.40625
5      gear     3    15  0.46875
6      gear     4    12  0.37500
7      gear     5     5  0.15625
8      carb     1     7  0.21875
9      carb     2    10  0.31250
10     carb     3     3  0.09375
11     carb     4    10  0.31250
12     carb     6     1  0.03125
13     carb     8     1  0.03125

The rest is formatting.




回答3:


Here's my solution. It ain't pretty, which is why I put a bag over its head (wrap it in a function). I also add another variable to demonstrate that it's general (I hope).

prettyTable <- function(x) {

  tbl <- apply(x, 2, function(m) {
    marc <- sort(unique(m))
    cnt <- matrix(table(m), ncol = 1)
    out <- cbind(marc, cnt)
    out <- out[order(marc), ] # do sorting
    out <- cbind(out, round(prop.table(out, 2)[, 2] * 100, 2))
  })

  x2 <- do.call("rbind", tbl)

  spaces <- unlist(lapply(apply(x, 2, unique), length))
  space.names <- names(spaces)
  spc <- rep("", sum(spaces))
  ind <- cumsum(spaces)
  ind <- abs(spaces - ind)+1
  spc[ind] <- space.names

  out <- cbind(spc, x2)
  out <- as.data.frame(out)

  names(out) <- c("Variable", "Levels", "Count", "Column N %")
  out
}

prettyTable(x = mtcars[, c(2, 8:11)])

   Variable Levels Count Column N %
1       cyl      4    11      34.38
2                6     7      21.88
3                8    14      43.75
4        vs      0    18      56.25
5                1    14      43.75
6        am      0    19      59.38
7                1    13      40.62
8      gear      3    15      46.88
9                4    12       37.5
10               5     5      15.62
11     carb      1     7      21.88
12               2    10      31.25
13               3     3       9.38
14               4    10      31.25
15               6     1       3.12
16               8     1       3.12

Using googleVis package, you can make a handy html table.

plot(gvisTable(prettyTable(x = mtcars[, c(2, 8:11)])))




回答4:


You may find the following code snippet useful. It utilizes the base package functions table, margin.table, and prop.table and does not require any other packages. It does collect the results to a list with named dimensions however (these could be collected to a single matrix with rbind):

dat <- table(mtcars[,8:11])
result <- list()
for(m in 1:length(dim(dat))){
    martab <- margin.table(dat, margin=m)
    result[[m]] <- cbind(Freq=martab, Prop=prop.table(martab))
}
names(result) <- names(dimnames(dat))

> result
$vs
  Freq   Prop
0   18 0.5625
1   14 0.4375

$am
  Freq    Prop
0   19 0.59375
1   13 0.40625

$gear
  Freq    Prop
3   15 0.46875
4   12 0.37500
5    5 0.15625

$carb
  Freq    Prop
1    7 0.21875
2   10 0.31250
3    3 0.09375
4   10 0.31250
6    1 0.03125
8    1 0.03125



回答5:


Here is a solution using the freq function of the questionr package (shameless autopromotion, sorry) :

R> lapply(df, freq)
$vs
    n    %
0  18 56.2
1  14 43.8
NA  0  0.0

$am
    n    %
0  19 59.4
1  13 40.6
NA  0  0.0

$gear
    n    %
3  15 46.9
4  12 37.5
5   5 15.6
NA  0  0.0

$carb
    n    %
1   7 21.9
2  10 31.2
3   3  9.4
4  10 31.2
6   1  3.1
8   1  3.1
NA  0  0.0



回答6:


Unfortunately there seems to be no R package yet that can generate a nice output like SPSS. Most functions for generating tables seem to define their own special formats what gets you into trouble if you want to export or work on it in another way.
But I'm sure R is capable of that and so I started writing my own functions. I'm happy to share the result (work in progress-status, but gets the job done) with you:

The following function returns for all factor variables in a data.frame the frequency or the percentage (calc="perc") for each level of the factor variable "variable".
The most important thing may be that the output is a simple & user friendly data.frame. So, compared to many other functions, it's no problem to export the results an work with it in any way you want.

I realize that there is much potential for further improvements, i.e. add a possibility for selecting row vs. column percentage calculation, etc.

contitable <- function( survey_data, variable, calc="freq" ){    

  # Check which variables are not given as factor    
  # and exlude them from the given data.frame    
 survey_data_factor_test <- as.logical( sapply( Survey, FUN=is.factor) )    
  survey_data <- subset( survey_data, select=which( survey_data_factor_test ) )    

  # Inform the user about deleted variables    
  # is that proper use of printing to console during a function call??    
  # for now it worksjust fine...    
  flush.console()        
  writeLines( paste( "\n ", sum( !survey_data_factor_test, na.rm=TRUE),
            "non-factor variable(s) were excluded\n" ) )

  variable_levels <- levels(survey_data[ , variable ])    
  variable_levels_length <- length( variable_levels )    

  # Initializing the data.frame which will gather the results    
  result <- data.frame( "Variable", "Levels", t(rep( 1, each=variable_levels_length ) ) )    
  result_column_names <- paste( variable, variable_levels, sep="." )    
  names(result) <- c("Variable", "Levels", result_column_names )    

  for(column in 1:length( names(survey_data) ) ){       

      column_levels_length <- length( levels( survey_data[ , column ] ) )
      result_block <- as.data.frame( rep( names(survey_data)[column], each=column_levels_length ) )
      result_block <- cbind( result_block, as.data.frame( levels( survey_data[,column] ) ) )
      names(result_block) <- c( "Variable", "Levels" )

      results <- table( survey_data[ , column ], survey_data[ , variable ] )

      if( calc=="perc" ){ 
        results <- apply( results, MARGIN=2, FUN=function(x){ x/sum(x) }) 
        results <- round( results*100, 1 )
      }

      results <- unclass(results)
      results <- as.data.frame( results )
      names( results ) <- result_column_names
      rownames(results) <- NULL

      result_block <- cbind( result_block, results) 
      result <- rbind( result, result_block ) 
}    
result <- result[-1,]        
return( result )    
}


来源:https://stackoverflow.com/questions/15111629/create-summary-table-of-categorical-variables-of-different-lengths

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!