This question already has an answer here:
A seemingly easy problem is keeping me very busy.
I have a data frame:
> df1
Name Score
1 Ben 1
2 Ben 2
3 John 1
4 John 2
5 John 3
I would like to create a summary of the table like this:
> df2
Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1
So df2 must (i) only show unique "Names" and (ii) create columns from the unique factors in "Score" and (iii) count the number of times a person received said score.
I have tried:
df2 <- ddply(df1, c("Name"), summarise
,Score_1 = sum(df1$Score == 1)
,Score_2 = sum(df1$Score == 2)
,Score_3 = sum(df1$Score == 3))
which produces:
Name Score_1 Score_2 Score_3
1 Ben 2 2 1
2 John 2 2 1
So my attempt incorrectly counts all occurences instead of counting "per group"
EDIT:
As per the comments, also tried reshape
(possibly just doing it wrong):
> reshape(df1, idvar = "Name", timevar = "Score", direction = "wide")
Name
1 Ben
3 John
For a start, the "Score" column is missing but worse than that, from my research on reshape
, I am not convinced that I am going to get a count of each factor, which is the whole point.
You only need to make some slight modification to your code. You should use .(Name)
instead of c("Name")
:
ddply(df1, .(Name), summarise,
Score_1 = sum(Score == 1),
Score_2 = sum(Score == 2),
Score_3 = sum(Score == 3))
gives:
Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1
Other possibilities include:
1. table(df1)
as @alexis_laz mentioned in the comments, this gives:
> table(df1)
Score
Name 1 2 3
Ben 1 1 0
John 1 1 1
2. The dcast
function of the reshape2 package (or data.table which has the same dcast
function):
library(reshape2) # or library(data.table)
dcast(df1, Name ~ paste0("Score_", Score), fun.aggregate = length)
gives:
Name Score_1 Score_2 Score_3
1 Ben 1 1 0
2 John 1 1 1
We can use dplyr/tidyr
library(dplyr)
library(tidyr)
df1 %>%
group_by(Name) %>%
mutate(n=1, Score= paste('Score', Score, sep='_')) %>%
spread(Score, n, fill=0)
# Name Score_1 Score_2 Score_3
# (chr) (dbl) (dbl) (dbl)
#1 Ben 1 1 0
#2 John 1 1 1
来源:https://stackoverflow.com/questions/35126517/create-columns-from-factors-and-count