mutate_if, summarize_at etc coerce data.table to data.frame

こ雲淡風輕ζ 提交于 2019-12-02 01:49:14

问题


It seems like some dplyr functions, including mutate_if, mutate_all, mutate_at etc coerce data.table inputs to data.frame. That seems like strange behaviour, even though it is documented in ?mutate_all (Under 'Value', it says 'data.frame' - but it does not coerce tibbles to data.frames.)

require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#[1] "data.table" "data.frame"
class(mutate_if(dt, is.numeric, as.numeric))
#[1] "data.frame"

However, this does not happen with tibbles:

tb <- as_tibble(iris)
class(tb)
#[1] "tbl_df"     "tbl"        "data.frame"
class(mutate_if(tb, is.numeric, as.numeric))
#[1] "tbl_df"     "tbl"        "data.frame"

Is there some way to maintain the data.table, or do i need to coerce with as.data.table every time I use one of the scoped mutate functions?


回答1:


If you'd like to try an alternative, I recently released the table.express package, which uses many dplyr and custom verbs to build data.table expressions.

The linked vignette provides detailed explanations, but some examples:

library(data.table)
library(table.express)

data("iris")
DT <- as.data.table(iris)

# mutate_all (modification by reference does not print)
DT %>%
  mutate_sd(everything(), as.integer)

# mutate_if
DT %>%
  mutate_sd(~ is.numeric(.x), as.integer)

# mutate_at
DT %>%
  mutate_sd(contains("."), ~ .x * 1.5)

# transmute_all
DT %>%
  transmute_sd(everything(), as.integer)

# transmute_if
DT %>%
  transmute_sd(~ is.numeric(.x), as.integer)

# transmute_at
DT %>%
  transmute_sd(contains("."), as.integer)

Do note that mutate_sd modifies by reference by default, so re-define DT between examples if you like.

Also, as of version 0.3.0, you won't be able to load both table.express and dtplyr at the same time, since they define the same data.table methods for many dplyr generics.




回答2:


There may be no satisfying answer to your question, but these wrapper functions would make it such that you wouldn't have to convert back to a data table every time.

And if you didn't want to include these in each script or project, and you didn't want to put them in your .Rprofile, you could even make an itty-bitty package out of them. It's surprisingly easy.

mutate_all <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_all(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_all(...)
  }
}
mutate_if <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_if(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_if(...)
  }
}
mutate_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_at(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_at(...)
  }
}
transmute_all <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_all(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_all(...)
  }
}
transmute_if <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_if(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_if(...)
  }
}
transmute_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_at(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_at(...)
  }
}



回答3:


Have you tried using

df %>%
mutate_if(yourmutate) %>%
data.table()

Your frame will be both data.table and data.frame.

Following your example:

require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#
dt <- mutate_if(dt, is.numeric, as.numeric) %>% data.table()
class(dt)


来源:https://stackoverflow.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!