问题
I am trying to convert the following format:
mydata <- data.frame(movie = c("Titanic", "Departed"),
actor1 = c("Leo", "Jack"),
actor2 = c("Kate", "Leo"))
movie actor1 actor2
1 Titanic Leo Kate
2 Departed Jack Leo
to binary response variables:
movie Leo Kate Jack
1 Titanic 1 1 0
2 Departed 1 0 1
I tried the solution described in Convert row data to binary columns but I could get it to work for two variables, not three.
I would really appreciate if there is a clean way to do this.
回答1:
An updated tidyr-based option is to convert to long-shape, use complete to fill in missing combinations of movies and actors, and then just convert a logical is.na test to a numeric value. Then reshape back to wide.
library(tidyr)
mydata %>%
pivot_longer(starts_with("actor"), names_to = "acted") %>%
complete(movie, value) %>%
dplyr::mutate(acted = as.numeric(!is.na(acted))) %>%
pivot_wider(names_from = value, values_from = acted)
#> # A tibble: 2 x 4
#> movie Jack Leo Kate
#> <fct> <dbl> <dbl> <dbl>
#> 1 Departed 1 1 0
#> 2 Titanic 0 1 1
回答2:
How much spice is too much? Here is a solution via tidyr:
library(dplyr)
library(tidyr)
mydata %>%
gather(actor,name,starts_with("actor")) %>%
mutate(present = 1) %>%
select(-actor) %>%
spread(name,present,fill = 0)
movie Jack Kate Leo
1 Departed 1 0 1
2 Titanic 0 1 1
回答3:
One way to reshape your data.frame is with the reshape2 package, using melt and dcast. For example:
library(reshape2)
long.mydata <- melt(mydata, id.vars = "movie")
wide.mydata <- dcast(long.mydata, movie ~ value, function(x) 1, fill = 0)
Pay attention to the fun.aggregate and fill parameters in dcast, which control what goes to fill in the interior after casting.
回答4:
Since they say variety is the spice of life, here's an approach in base R using table:
table(cbind(mydata[1],
actor = unlist(mydata[-1], use.names=FALSE)))
# actor
# movie Jack Leo Kate
# Departed 1 1 0
# Titanic 0 1 1
The above output is a matrix of class table. To get a data.frame, use as.data.frame.matrix.
as.data.frame.matrix(table(
cbind(mydata[1], actor = unlist(mydata[-1], use.names=FALSE))))
# Jack Leo Kate
# Departed 1 1 0
# Titanic 0 1 1
回答5:
The reshape2-package has also the recast-function.
The code:
library(reshape2)
recast(mydata, id.var = 'movie', movie ~ value, fun.aggregate = length)
The result:
movie Jack Kate Leo
1 Departed 1 0 1
2 Titanic 0 1 1
来源:https://stackoverflow.com/questions/18474896/reshape-multiple-categorical-variables-to-binary-response-variables