问题
It seems like some dplyr
functions, including mutate_if
, mutate_all
, mutate_at
etc coerce data.table inputs to data.frame. That seems like strange behaviour, even though it is documented in ?mutate_all
(Under 'Value', it says 'data.frame' - but it does not coerce tibbles to data.frames.)
require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#[1] "data.table" "data.frame"
class(mutate_if(dt, is.numeric, as.numeric))
#[1] "data.frame"
However, this does not happen with tibbles:
tb <- as_tibble(iris)
class(tb)
#[1] "tbl_df" "tbl" "data.frame"
class(mutate_if(tb, is.numeric, as.numeric))
#[1] "tbl_df" "tbl" "data.frame"
Is there some way to maintain the data.table, or do i need to coerce with as.data.table
every time I use one of the scoped mutate
functions?
回答1:
If you'd like to try an alternative,
I recently released the table.express package,
which uses many dplyr
and custom verbs to build data.table
expressions.
The linked vignette provides detailed explanations, but some examples:
library(data.table)
library(table.express)
data("iris")
DT <- as.data.table(iris)
# mutate_all (modification by reference does not print)
DT %>%
mutate_sd(everything(), as.integer)
# mutate_if
DT %>%
mutate_sd(~ is.numeric(.x), as.integer)
# mutate_at
DT %>%
mutate_sd(contains("."), ~ .x * 1.5)
# transmute_all
DT %>%
transmute_sd(everything(), as.integer)
# transmute_if
DT %>%
transmute_sd(~ is.numeric(.x), as.integer)
# transmute_at
DT %>%
transmute_sd(contains("."), as.integer)
Do note that mutate_sd
modifies by reference by default,
so re-define DT
between examples if you like.
Also, as of version 0.3.0,
you won't be able to load both table.express
and dtplyr
at the same time,
since they define the same data.table
methods for many dplyr
generics.
回答2:
There may be no satisfying answer to your question, but these wrapper functions would make it such that you wouldn't have to convert back to a data table every time.
And if you didn't want to include these in each script or project, and you didn't want to put them in your .Rprofile
, you could even make an itty-bitty package out of them. It's surprisingly easy.
mutate_all <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% mutate_all(...) %>% as.data.table()
} else {
.tbl %>% mutate_all(...)
}
}
mutate_if <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% mutate_if(...) %>% as.data.table()
} else {
.tbl %>% mutate_if(...)
}
}
mutate_at <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% mutate_at(...) %>% as.data.table()
} else {
.tbl %>% mutate_at(...)
}
}
transmute_all <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% transmute_all(...) %>% as.data.table()
} else {
.tbl %>% transmute_all(...)
}
}
transmute_if <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% transmute_if(...) %>% as.data.table()
} else {
.tbl %>% transmute_if(...)
}
}
transmute_at <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% transmute_at(...) %>% as.data.table()
} else {
.tbl %>% transmute_at(...)
}
}
回答3:
Have you tried using
df %>%
mutate_if(yourmutate) %>%
data.table()
Your frame will be both data.table
and data.frame
.
Following your example:
require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#
dt <- mutate_if(dt, is.numeric, as.numeric) %>% data.table()
class(dt)
来源:https://stackoverflow.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame