How to exclude certain observations while generating summary statistics without creating a new data frame in R

问题

My problem is:

I have a large number of numeric variables for which I need to generate summary statistics. Some of the observations are coded "-99", which means the participant does not know the answer to the survey question.

While calculating means for such variables, I want to exclude the "-99" observations. Since I have a lot of variables, it would be quite onerous to use "subset".

Does anyone know an easier way?

PS: I know that for factors, the >- Summarize(df, exclude ="") command in the FSA package could work. I am just not sure if there is an equivalent for numeric variables.

回答1:

Just make yourself a simple wrapper function around summary:

set.seed(1)
x <- rnorm(100)
x[sample(seq_along(x), 10)] <- -99
summary2 <- function(x) summary(x[x!=-99])

Compare results:

> summary(x)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-99.00000  -0.70810  -0.04209  -9.79400   0.59810   2.40200

> summary2(x)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2.21500 -0.52640  0.07445  0.11770  0.67230  2.40200

来源：https://stackoverflow.com/questions/21891409/how-to-exclude-certain-observations-while-generating-summary-statistics-without

标签

statistics

summary

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!