Matching and Adding Factor Counts in R Data Frames

问题

My question stems out of this and this question asked previously and sufficiently different from them I think. Imagine that I have a minimal dataset (bird) where every row represents an observation of birds at a given time and place as follows:

id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null

What I ideally want is a conversion of this dataset into a form like this removing the nulls from the dataset:

id,tot_obs,country_tot
A,12,2
B,3,1
C,5,2

I know that I can get a count of factors using:

table(bird$country)

but, is there a smarter, perhaps, one line way of removing the nulls, adding up the total counts, finding the counts of the countries and then reconfiguring them into this form? If there is a package which does this, then I am open to that suggestion as well. Thanks !

回答1:

Load data with stringsAsFactors=FALSE:

df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)

# check to see if columns are factors
sapply(df, class)
#          id         obs     country 
# "character"   "integer" "character"

Remove all rows with country = null

df <- df[df$country != "null", ]

Then you can use plyr package with summarise to get the desired result as follows:

ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
#   id tot_obs tot_country
# 1  A      12           2
# 2  B       3           1
# 3  C       5           2

来源：https://stackoverflow.com/questions/15535719/matching-and-adding-factor-counts-in-r-data-frames

标签

dataframe

multiple-columns