Create numerically encoded dummy variables efficiently in R?

。_饼干妹妹 提交于 2021-02-04 19:58:55

问题


How can we transform data of the form

df <- structure(list(customer_number = c(3, 3, 1, 1, 3), 
                     item = c("milkshake","burger", "apple", "burger", "water")
                       ), 
                row.names = c(NA, -5L), class = "data.frame")


#   customer_number      item
# 1               3 milkshake
# 2               3    burger
# 3               1     apple
# 4               1    burger
# 5               3     water

into numerically encoded dummy variables, like this


data.frame(customer_number=c(1,3),
           item_milkshake=c(0,1),
           item_burger=c(1,1),
           item_apple=c(1,0),
           item_water=c(0,1))

#   customer_number item_milkshake item_burger item_apple item_water
# 1               1              0           1          1          0
# 2               3              1           1          0          1

回答1:


We can create a dummy column with value as 1 and get the data in wide format.

library(dplyr)

df %>%
  mutate(n = 1) %>%
  arrange(customer_number) %>%
  tidyr::pivot_wider(names_from = item, values_from = n,
                     values_fill = list(n = 0), names_prefix = "item_")

# A tibble: 2 x 5
#  customer_number item_apple item_burger item_milkshake item_water
#            <dbl>      <dbl>       <dbl>          <dbl>      <dbl>
#1               1          1           1              0          0
#2               3          0           1              1          1



回答2:


If you want to use basic R functions, here is a simple solution using table() function:

#Create the dataset
df <- structure(list(customer_number = c(3, 3, 1, 1, 3), item = c("milkshake", 
                                                             "burger", "apple", "burger", "water")), row.names = c(NA, -5L

res <- as.matrix(table(df$customer_number,df$item))
res[res > 0 ] <- 1 #dummy variable
res

    apple burger milkshake water
  1     1      1         0     0
  3     0      1         1     1

You can add customer_number as a separate column to the matrix:

res <- cbind(customer_number = as.numeric(rownames(res)), res)
res

  customer_number apple burger milkshake water
1               1     1      1         0     0
3               3     0      1         1     1


来源:https://stackoverflow.com/questions/60427257/create-numerically-encoded-dummy-variables-efficiently-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!