I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows:
Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):
nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])
and with aggregate, following @GregSnow:
aggregate(x ~ Year + Month, data = df, FUN = length)
You can use by functions as by(df1$Year, df1$Month, count) that will produce a list of needed aggregation.
The output will look like,
df1$Month: Feb
x freq
1 2012 1
2 2013 1
3 2014 5
---------------------------------------------------------------
df1$Month: Jan
x freq
1 2012 5
2 2013 2
---------------------------------------------------------------
df1$Month: Mar
x freq
1 2012 1
2 2013 3
3 2014 2
>
The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).
For my aggregations I usually end up wanting to see mean and "how big is this group" (a.k.a. length). So this is my handy snippet for those occasions;
agg.mean <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="mean")
agg.count <- aggregate(columnToMean ~ columnToAggregateOn1*columnToAggregateOn2, yourDataFrame, FUN="length")
aggcount <- agg.count$columnToMean
agg <- cbind(aggcount, agg.mean)
Create a new variable Count with a value of 1 for each row:
df1["Count"] <-1
Then aggregate dataframe, summing by the Count column:
df2 <- aggregate(df1[c("Count")], by=list(Year=df1$Year, Month=df1$Month), FUN=sum, na.rm=TRUE)
There are plenty of wonderful answers here already, but I wanted to throw in 1 more option for those wanting to add a new column to the original dataset that contains the number of times that row is repeated.
df1$counts <- sapply(X = paste(df1$Year, df1$Month),
FUN = function(x) { sum(paste(df1$Year, df1$Month) == x) })
The same could be accomplished by combining any of the above answers with the merge() function.