Transposing a data frame while preserving class/data type information

怎甘沉沦 提交于 2020-01-03 15:34:23

问题


tl;dr

How can I transpose a data frame with column vectors of different classes (one column is character, the next is numeric, yet another is logical etc.) in a way that preserves the class/data type information?

Example data:

mydata <- data.frame(
  col0 = c("row1", "row2", "row3"),
  col1 = c(1, 2, 3),
  col2 = letters[1:3],
  col3 = c(TRUE, FALSE, TRUE)
)

Here's also a little xlsx file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx

Problem

A simple t() or a bit more involved routines like the one suggested in this post are great, but do not preserve the class information or the original data frame's columns. I'm also aware that class data.frame was never meant to store mixed-class information in its columns.

However, at least I would like come as close as possible to simply "inverting" the way data.frames were meant to be: centering the perspective on rows instead of columns perspective. I.e., all elements in row vectors need to belong to the same class while classes can differ across column vectors.

Context

I often work in projects where people are used to representing time series data in a horizontal ("variables-are-in-rows") instead of a vertical ("variables-are-in-columns") orientation that we are all used to in R (and which, IMHO, also makes much more sense).

What's more, they are using MS Excel extensively. And I need to both read data in "wide-format" and I to update existing Excel files by writing formulas directly from R to the file via XLConnect and/or openxlsx (as opposed to being able to do my calculations in R and then simply dumping the final result in an Excel file).

While I constantly try to tell them that using such an orientation means working against established standards across languages/tools (at least true for R and MS Excel), it is not very likely that they are going to switch. So I have to deal with that somehow.

Current approach

So I though about keeping an underlying list but make it "look and feel" like a data.frame as much as possible. It works, but is pretty involved. I thought there might be some more clever solution to this.

Function def:

transpose <- function (
  x,
  col = character(),
  rnames_or_col = c("col", "rnames")
) {
  rnames_or_col <- match.arg(rnames_or_col, c("col", "rnames"))

  ## Buffering column names //
  cnames <- if (length(col)) {
    x[[col]]
  } else {
    make.names(1:nrow(x))
  }

  ## Removing anchoring column //
  if (inherits(x, "data.table")) {
    x <- as.data.frame(x, stringsAsFactors = FALSE)
  }
  ## I don't like this part. Any suggestions on how a) build on top of existing
  ## data.table functionality b) the easiest way to make a data.table behave
  ## like a data.frame when indexing (remove operation below will yield
  ## "undesired" results from a data.frame perspective; it's fine in from
  ## data.table's perspective/paradigm of course)
  if (length(col)) {
    x <- x[ , -which(names(x) == col)]
  }

  ## Buffer classes //
  classes <- lapply(x, class)

  ## Buffer row names //
  rnames <- names(x)

  ## Listify //
  x <- lapply(as.list(x), function(row) {
    df <- do.call(data.frame, list(as.list(row), stringsAsFactors = FALSE))
    names(df) <- cnames
    df
  })
  names(x) <- rnames

  ## Actual row names or row names as first column //
  if (rnames_or_col == "col") {
    x <- lapply(x, function(ii) {
      data.frame(variable = row.names(ii), ii,
        stringsAsFactors = FALSE, row.names = NULL, check.names = FALSE)
    })
  }

  ## Class //
  class(x) <- c("df_transposed", class(x))

  x
}

Print method:

print.df_transposed <- function(object) {
  cat("df_transposed: \n")
  out <- do.call(rbind, object)
  rownames(out) <- NULL
  print(out)
}

Getter and setter methods:

"[<-.df_transposed" <- function(x, i, j, value) {
  x[[i]][ , j] <- value
  x
}

"[.df_transposed" <- function(x, i, j, drop = FALSE) {
# foo <- function(x, i, j, drop = FALSE) {
  has_i <- !missing(i)
  has_j <- !missing(j)
  cls <- class(x)
  scope <- if (has_i) {
    i
  } else {
    1:length(x)
  }
  out <- lapply(unclass(x)[scope], function(ii) {
    nms <- names(ii)

    if (has_j) {
      tmp <- ii[ , j, drop = drop]
      names(tmp) <- nms[j]
      ## --> necessary due to `check.names` missing for `[.data.frame` :-/
      tmp
    } else {
      ii
    }
  })
  class(out) <- cls
  out
}

Class function:

class2 <- function(x) {
  sapply(x, function(ii) {
    value <- if ("variable" %in% names(ii)) {
      unlist(ii[, -1])
    } else {
      unlist(ii)
    }
    class(value)
  })
}

Application

Example data:

mydata <- data.frame(
  col0 = c("row1", "row2", "row3"),
  col1 = c(1, 2, 3),
  col2 = letters[1:3],
  col3 = c(TRUE, FALSE, TRUE)
)

Actual transposing and print methods:

> (df_t <- transpose(mydata, col = "col0"))
df_transposed: 
  variable row1  row2 row3
1     col1    1     2    3
2     col2    a     b    c
3     col3 TRUE FALSE TRUE
> (df_t2 <- transpose(mydata, col = "col0", rnames_or_col = "rnames"))
df_transposed: 
     row1  row2 row3
col1    1     2    3
col2    a     b    c
col3 TRUE FALSE TRUE

Printing unclassed objects:

> unclass(df_t)
$col1
  variable row1 row2 row3
1     col1    1    2    3

$col2
  variable row1 row2 row3
1     col2    a    b    c

$col3
  variable row1  row2 row3
1     col3 TRUE FALSE TRUE

> unclass(df_t2)
$col1
     row1 row2 row3
col1    1    2    3

$col2
     row1 row2 row3
col2    a    b    c

$col3
     row1  row2 row3
col3 TRUE FALSE TRUE

Class query:

> class2(df_t)
       col1        col2        col3 
  "numeric" "character"   "logical"     

Indexing:

> dat_t[1, ]
df_transposed: 
  variable row1 row2 row3
1        1    1    2    3
> dat_t[, 1]
df_transposed: 
  variable
1        1
2        1
3        1
> 
> dat_t[1, 2]
df_transposed: 
     row1
col1    1
> dat_t[2, 3]
df_transposed: 
     row2
col2    b
> 
> dat_t[1:2, ]
df_transposed: 
  variable row1 row2 row3
1        1    1    2    3
2        1    a    b    c
> dat_t[, 1:3]
df_transposed: 
  variable row1  row2
1        1    1     2
2        1    a     b
3        1 TRUE FALSE
> 
> dat_t[c(1, 3), 2:4]
df_transposed: 
     row1 row2 row3
col1    1    2    3
col3    1    0    1

来源:https://stackoverflow.com/questions/35224338/transposing-a-data-frame-while-preserving-class-data-type-information

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!