frequency table with several variables in R

☆樱花仙子☆ 提交于 2019-11-29 00:11:33

Using plyr:

require(plyr)
ddply(d1, .(ExamenYear), summarize,
      All=length(ExamenYear),
      participated=sum(participated=="yes"),
      ofwhichFemale=sum(StudentGender=="F"),
      ofWhichPassed=sum(passed=="yes"))

Which gives:

  ExamenYear All participated ofwhichFemale ofWhichPassed
1       2007   3            2             2             2
2       2008   4            3             2             3
3       2009   3            3             0             2

The plyr package is great for this sort of thing. First load the package

library(plyr)

Then we use the ddply function:

ddply(d1, "ExamenYear", summarise, 
      All = length(passed),##We can use any column for this statistics
      participated = sum(participated=="yes"),
      ofwhichFemale = sum(StudentGender=="F"),
      ofwhichpassed = sum(passed=="yes"))

Basically, ddply expects a dataframe as input and returns a data frame. We then split up the input data frame by ExamenYear. On each sub table we calculate a few summary statistics. Notice that in ddply, we don't have to use the $ notation when referring to columns.

There could have been a couple of modifications (use with to reduce the number of df$ calls and use character indices to improve self-documentation) to your code that would have made it easier to read and a worthy competitor to the ddply solutions:

with( d1, cbind(All = table(ExamenYear),
  participated      = table(ExamenYear, participated)[,"yes"],
  ofwhichFemale     = table(ExamenYear, StudentGender)[,"F"],
  ofwhichpassed     = table(ExamenYear, passed)[,"yes"])
     )

     All participated ofwhichFemale ofwhichpassed
2007   3            2             2             2
2008   4            3             2             3
2009   3            3             0             2

I would expect this to be much faster than the ddply solution, although that will only be apparent if you are working on larger datasets.

You may also want to take a look of the plyr's next iterator: dplyr

It uses a ggplot-like syntax and provide fast performance by writing key pieces in C++.

d1 %.% 
group_by(ExamenYear) %.%    
summarise(ALL=length(ExamenYear),
          participated=sum(participated=="yes"),
          ofwhichFemale=sum(StudentGender=="F"),
          ofWhichPassed=sum(passed=="yes"))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!