I would like to use stargazer to produce summary statistics for each category of a grouping variable. I could do it in separate tables, but I'd like it all in one – if that is not unreasonably challenging for this package.
For example
library(stargazer)
stargazer(ToothGrowth, type = "text")
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 60 18.813 7.649 4.200 33.900
#> dose 60 1.167 0.629 0.500 2.000
#> -----------------------------------------
provides summery statistics for the continues variables in ToothGrowth. I would like to split that summery by the categorical variable supp, also in ToothGrowth.
Two suggestions for desired outcome,
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> OJ len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> VC len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> --------------------------------------------------
#>
stargazer(ToothGrowth ~ supp, type = "text")
#>
#> ==================================================
#> Statistic N Mean St. Dev. Min Max
#> --------------------------------------------------
#> len
#> _by VC 30 16.963 8.266 4.200 33.900
#> _by VC 30 1.167 0.634 0.500 2.000
#> _tot 60 18.813 7.649 4.200 33.900
#>
#> dose
#> _by OJ 30 20.663 6.606 8.200 30.900
#> _by OJ 30 1.167 0.634 0.500 2.000
#> _tot 60 1.167 0.629 0.500 2.000
#> --------------------------------------------------
Solution
library(stargazer)
library(dplyr)
library(tidyr)
ToothGrowth %>%
group_by(supp) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
gather(temp, val, len, dose) %>%
unite(temp1, supp, temp, sep = '_') %>%
spread(temp1, val) %>%
select(-id) %>%
as.data.frame() %>%
stargazer(type = 'text')
Result
=========================================
Statistic N Mean St. Dev. Min Max
-----------------------------------------
OJ_dose 30 1.167 0.634 0.500 2.000
OJ_len 30 20.663 6.606 8.200 30.900
VC_dose 30 1.167 0.634 0.500 2.000
VC_len 30 16.963 8.266 4.200 33.900
-----------------------------------------
Explanation
This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer was to create a new data frame that had variables for each group's observations using a gather(), unite(), spread() strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer().
Three possible solution. One using reporttools and xtable, one using tidyverse tools along with stargazer, and third a base-r solution.
First,
I want to suggest you take a look at reporttools which is kinda leaving stargazer, but I think you should take a look at it,
# install.packages("reporttools") #Use this to install it, do this only once
require(reporttools)
vars <- ToothGrowth[,c('len','dose')]
group <- ToothGrowth[,c('supp')]
## display default statistics, only use a subset of observations, grouped analysis
tableContinuous(vars = vars, group = group, prec = 1, cap = "Table of 'len','dose' by 'supp' ", lab = "tab: descr stat")
% latex table generated in R 3.3.3 by xtable 1.8-2 package
\begingroup\footnotesize
\begin{longtable}{llrrrrrrrrrr}
\textbf{Variable} & \textbf{Levels} & $\mathbf{n}$ & \textbf{Min} & $\mathbf{q_1}$ & $\mathbf{\widetilde{x}}$ & $\mathbf{\bar{x}}$ & $\mathbf{q_3}$ & \textbf{Max} & $\mathbf{s}$ & \textbf{IQR} & \textbf{\#NA} \\
\hline
len & OJ & 30 & 8.2 & 15.5 & 22.7 & 20.7 & 25.7 & 30.9 & 6.6 & 10.2 & 0 \\
& VC & 30 & 4.2 & 11.2 & 16.5 & 17.0 & 23.1 & 33.9 & 8.3 & 11.9 & 0 \\
\hline
& all & 60 & 4.2 & 13.1 & 19.2 & 18.8 & 25.3 & 33.9 & 7.6 & 12.2 & 0 \\
\hline
dose & OJ & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
& VC & 30 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
& all & 60 & 0.5 & 0.5 & 1.0 & 1.2 & 2.0 & 2.0 & 0.6 & 1.5 & 0 \\
\hline
\hline
\caption{Table of 'len','dose' by 'supp' }
\label{tab: descr stat}
\end{longtable}
\endgroup
in latex you get this nice result,

Second,
using tidyverse tools along with stargazer, inspired by this SO answer,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(dplyr); library(purrr)
#> ToothGrowth %>% split(. $supp) %>% walk(~ stargazer(., type = "text"))
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
Third,
an exclusive base-r
by(ToothGrowth, ToothGrowth$supp, stargazer, type = 'text')
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 20.663 6.606 8.200 30.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#>
#> =========================================
#> Statistic N Mean St. Dev. Min Max
#> -----------------------------------------
#> len 30 16.963 8.266 4.200 33.900
#> dose 30 1.167 0.634 0.500 2.000
#> -----------------------------------------
#> ToothGrowth$supp: OJ
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 20.663 6.606 8.200 30.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
#> ---------------------------------------------------------------
#> ToothGrowth$supp: VC
#> [1] ""
#> [2] "========================================="
#> [3] "Statistic N Mean St. Dev. Min Max "
#> [4] "-----------------------------------------"
#> [5] "len 30 16.963 8.266 4.200 33.900"
#> [6] "dose 30 1.167 0.634 0.500 2.000 "
#> [7] "-----------------------------------------"
invisible(lapply(levels(ToothGrowth$supp),stargazer))
would do, but if you want separate \subsection{} in between, you probable should use something like
invisible(lapply(levels(ToothGrowth$supp),function(sg){
cat("\\subsection{add your text here}\n")
print(stargazer(sg)
})
You might simply use subset with stargazer. *Also, make sure your data is a data frame using as.data.frame for stargazer to produce outputs.
library(stargazer)
# Descriptive statistics for Income of Org 1
stargazer(subset(mydata, mydata$org==1),
title="Income for Org 1", type = "html", out="stat_org1.html")
来源:https://stackoverflow.com/questions/25389683/obtaining-separate-summary-statistics-by-categorical-variable-with-stargazer-pac