R dummies package weird column names when knitted via .Rmd

隐身守侯 提交于 2019-12-08 08:34:41

问题


I've just noticed a very weird behavior in the dummies package of R when knitted in .Rmd. Here's the reproducible example.

---
title: "Dummies Package Behavior"
author: "Kim"
date: '`r Sys.Date()`'
output:
  pdf_document:
    toc: yes
    toc_depth: '3'
---

Load the libraries

```{r}
library(tidyverse)
library(dummies)
```

Main data wrangling

```{r}
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
```

View output

```{r}
df
```

What I'm expecting to see when I view the df are nice 0-1 columns of year2016, year2017, and year2018, which is the normal behavior for the dummies package.

When you knit this R Markdown document in RStudio, it instead brings out the following: C:/Users/Kim/Desktop/dummies.Rmd2016, C:/Users/Kim/Desktop/dummies.Rmd2017, and C:/Users/Kim/Desktop/dummies.Rmd2018. That is, it uses the whole document address to make the column names.

I don't understand why such behavior occurs. Obviously, I want to have column names as year2016, year2017, and year2018.


回答1:


The problem is not related to dplyr because we can reproduce it with data.frame(). Apparently there is a problem with assigning column labels in the dummy() function when executed as part of an R Markdown document. As noted in Luke's answer, one workaround is to use dummy.data.frame(). Another would be to use the colnames() function to rename the columns after binding the year and dummy variables with cbind(), which also enables a dplyr-based solution.

This should probably be submitted as a bug report for the dummies package.

---
title: "Behavior of dummies package"
author: "anAuthor"
date: "12/26/2017"
output:
  html_document: default
  pdf_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# first, reproduce error with data.frame()

```{r}
library(dummies)
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy(df$year)
dummyCols <- as.data.frame(dummyCols)
dummyCols
```

# data.frame() approach to fix the error

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df
dummyCols <- dummy.data.frame(data=df,dummy.classes="ALL")
dummyCols
df <- cbind(df, dummyCols)
df
```

...and the output, first reproducing the error.

...second, using dummies.data.frame() to avoid the error.

The dplyr correction works as follows.

# dplyr approach 

```{r}
library(tidyverse)
df <- data_frame(year = c(2016, 2017, 2018))
temp <- dummy(df$year)
temp <- as_data_frame(temp)
df <- bind_cols(df, temp)
colnames(df) <- c("year",unlist(lapply(2016:2018,function(x) {
     paste("year",x,sep="")
})))
df
```




回答2:


I'm not sure why that interaction is happening, but this slight modification seems to get around it:

```{r}
df <- data.frame(year = c(2016, 2017, 2018))
df <- data.frame(df, dummy.data.frame(data = df, dummy.classes = "ALL"))
```

Note that data.frame from base rather than data_frame from dplyr seems to make a difference.



来源:https://stackoverflow.com/questions/47976971/r-dummies-package-weird-column-names-when-knitted-via-rmd

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!