问题
My question stems out of this and this question asked previously and sufficiently different from them I think. Imagine that I have a minimal dataset (bird) where every row represents an observation of birds at a given time and place as follows:
id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null
What I ideally want is a conversion of this dataset into a form like this removing the nulls from the dataset:
id,tot_obs,country_tot
A,12,2
B,3,1
C,5,2
I know that I can get a count of factors using:
table(bird$country)
but, is there a smarter, perhaps, one line way of removing the nulls, adding up the total counts, finding the counts of the countries and then reconfiguring them into this form? If there is a package which does this, then I am open to that suggestion as well. Thanks !
回答1:
Load data with stringsAsFactors=FALSE
:
df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)
# check to see if columns are factors
sapply(df, class)
# id obs country
# "character" "integer" "character"
Remove all rows with country = null
df <- df[df$country != "null", ]
Then you can use plyr
package with summarise
to get the desired result as follows:
ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
# id tot_obs tot_country
# 1 A 12 2
# 2 B 3 1
# 3 C 5 2
来源:https://stackoverflow.com/questions/15535719/matching-and-adding-factor-counts-in-r-data-frames