Best way to plot automatically all data.table columns using ggplot2

前端 未结 2 754
刺人心
刺人心 2021-01-07 14:08

I\'m trying to make use of advanced tricks from data.table and ggplot2 functionalities to create a simple yet powerful function that automatically

相关标签:
2条回答
  • 2021-01-07 14:35

    I hope this works for you:

    plotAllXYbyZ <- function(dt, x, y, z) {
      # to make sure all columns to be melted for ploting are numerical 
      dt[, (y):= lapply(.SD, function(x) {as.numeric(as.character(x))}), .SDcols = y]
      dts <- melt(dt, id = c(x,z), measure = y)
      ggplot(dts, aes_string(x = colnames(dt)[x], y = "value", colours = colnames(dt)[z])) +
        geom_line() + facet_wrap(~ variable)
    }
    
    dt <- data.table(mtcars)    
    
    plotAllXYbyZ(dt, x=1, y=3:10, z=2)
    

    0 讨论(0)
  • 2021-01-07 14:41

    Thanks to comments above, below is the code that achieves the desired output. - Figures below show the output produced for these lines:

        dtDiamonds <- data.table(diamonds[1:100,])[order(carat),cut:=as.character(cut)]
        plotAllXYbyZ(dtDiamonds);   
        plotAllXYbyZ(dtDiamonds, x="carat", k="color") 
        plotAllXYbyZ(dtDiamonds, x=1, y=c(2,8:10), k=3)
    

    In order to do that I had to introduce a function to convert everything to numeric. The only remaining issue is that the original dtDiamonds gets modified ! - because of ':='. To resolve this issue however I posted a separate question here:To address this issue I asked a new question here: Don't want original data.table to be modified when passed to a function. UPDATE: This issue is now resolved by using <-copy(dt) instead of <-dt.

    # A function to convert factors and characters to numeric. 
    my.as.numeric <- function (x) {
      if (is.factor(x)) {
        if (T %in% is.na(as.numeric(as.character(x)))) # for factors like "red", "blue"
          return (as.numeric(x))   
        else                                           # for factors like  "20", "30", ...
          return (as.numeric(as.character(x)))         # return: 20, 30, ...
      }
      else if (is.character(x)) {
        if (T %in% is.na(as.numeric(x))) 
          return (as.numeric(as.ordered(x)))  
        else                            # the same: for character variables like "20", "30", ...
          return (as.numeric(x))        # return: 20, 30, ... Otherwise, convert them to factor
        return (x)   
      }
    }
    
     plotAllXYbyZ <- function(.dt, x=NULL, y=NULL, k=NULL) { 
      dt <- copy(.dt)    # NB: If copy is not used, the original data.table will get modified !
      if (is.numeric(x)) x <-  names(dt)[x]
      if (is.numeric(y)) y <-  names(dt)[y]
      if (is.numeric(k)) k <-  names(dt)[k]
    
      if (is.null(x)) x <- names(dt)[1]    
    
      "%wo%" <- function(x, y) x[!x %in% y]    
      if (is.null(y)) y <- names(dt) %wo% c(x,k)
    
      # to make sure all columns to be melted for plotting are numerical 
      dt[, (y):= lapply(.SD, function(x) {my.as.numeric(x)}), .SDcols = y]
    
      ggplot(melt(dt, id=c(x,k), measure = y)) + 
        geom_step(aes(get(x),value,col=variable))  +
        ifelse (is.null(k), list(NULL), list(facet_wrap(~get(k))) ) + 
        labs(x=x, title=sprintf("variable = F (%s | %s)", x, k))
    }
    

    [enter image description here][enter image description here]3

    0 讨论(0)
提交回复
热议问题