问题
I am trying to use pivot_wider
on my data. The data looks like:
dates yes_no
1 2017-01-01 0
2 2017-01-02 1
3 2017-01-03 0
4 2017-01-04 1
5 2017-01-05 1
Where I am trying to get the expected output to be:
dates yes_no 2017-01-02_1 2017-01-04_1 2017-01-05_1
1 2017-01-01 0 0 0 0
2 2017-01-02 1 1 0 0
3 2017-01-03 0 0 0 0
4 2017-01-04 1 0 1 0
5 2017-01-05 1 0 0 1
Where the data has been spread
when the yes_no
column has a 1 in.
This doesn't work for me:
d %>%
mutate(value_for_one_hot = 1) %>%
pivot_wider(names_from = dates, values_from = value_for_one_hot,
names_prefix = "date_", values_fill = list(value_for_one_hot = 0))
Data:
data.frame(
dates = c("2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05"),
yes_no = c(0, 1, 0, 1, 1)
)
回答1:
Create a duplicate column for yes_no
and a new column for the column names then do a normal spread
or pivot_wider
library(dplyr)
library(tidyr)
df %>% mutate(yes_no_dup=yes_no, cols=if_else(yes_no==1, paste0(dates,'_1'), NA_character_)) %>%
spread(cols, yes_no_dup, fill = list(yes_no_dup = 0)) %>%
select(-`<NA>`)
回答2:
Here's a data.table approach that does not actually reshape the data.
library(data.table)
setDT(d)
ind <- d[['yes_no']] != 0
cols <- as.character(d[['dates']])[ind]
d[, (cols) := 0L]
d[ind, (cols) := as.data.frame(diag(.N))]
## also valid
# set(d, which(ind), cols, as.data.frame(diag(length(cols))))
d
来源:https://stackoverflow.com/questions/59088378/pivot-wider-based-on-condition-of-a-0-or-1