Using tidyr spread function to create columns with binary value

馋奶兔 提交于 2019-11-26 17:24:30

问题


I am aware of spread function in tidyr package but this is something I am unable to achieve. I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0.

Below is the data.frame

studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
         Subject = c("Maths", "Science", "English", "Maths", "History", "History"))

> studentInfo
  StudentID Subject
1         1   Maths
2         1 Science
3         1 English
4         2   Maths
5         3 History
6         3 History

And the output I am expecting is:

  StudentID Maths Science English History
1         1     1       1       1       0
2         2     1       0       0       0
3         3     0       0       0       1

Please assist how to do this with "spread" function or any other function. Thanks


回答1:


Using reshape2 we can dcast from long to wide.

As you only want a binary outcome we can unique the data first

library(reshape2)

si <- unique(studentInfo)
dcast(si, formula = StudentID ~ Subject, fun.aggregate = length)

#  StudentID English History Maths Science
#1         1       1       0     1       1
#2         2       0       0     1       0
#3         3       0       1     0       0

Another approach using tidyr and dplyr is

library(tidyr)
library(dplyr)

studentInfo %>%
  mutate(yesno = 1) %>%
  distinct %>%
  spread(Subject, yesno, fill = 0)

#  StudentID English History Maths Science
#1         1       1       0     1       1
#2         2       0       0     1       0
#3         3       0       1     0       0

Although I'm not a fan (yet) of tidyr syntax...




回答2:


We can use table from base R

+(table(studentInfo)!=0)
#            Subject
#StudentID English History Maths Science
 #       1       1       0     1       1
 #       2       0       0     1       0
 #       3       0       1     0       0



回答3:


Using tidyr :

library(tidyr)
studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3),
                          Subject = c("Maths", "Science", "English", "Maths", "History", "History"))

pivot_wider(studentInfo,
            names_from = "Subject", 
            values_from = 'Subject', 
            values_fill = list(Subject=0),
            values_fn = list(Subject = ~+(as.logical(length(.)))))
#> # A tibble: 3 x 5
#>   StudentID Maths Science English History
#>       <dbl> <int>   <int>   <int>   <int>
#> 1         1     1       1       1       0
#> 2         2     1       0       0       0
#> 3         3     0       0       0       1

Created on 2019-09-19 by the reprex package (v0.3.0)



来源:https://stackoverflow.com/questions/35663580/using-tidyr-spread-function-to-create-columns-with-binary-value

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!