SI prefixes in ggplot2 axis labels

前端 未结 2 1660
我在风中等你
我在风中等你 2020-12-03 11:24

I often plot graphs in GNU R / ggplot for some measurements related to bytes. The builtin axis labels are either plain numbers or scientific notation, ie 1 Megabyte = 1e6. I

相关标签:
2条回答
  • 2020-12-03 11:27

    Update: Recent versions of the scales package include functionality to print readable labels.

    In this case, label_bytes can be used:

    library(ggplot2)
    library(scales)
    
    bytes <- 2^seq(0,20) + rnorm(21, 4, 2)
    
    my_data <- data.frame(
        bytes=as.integer(bytes),
        time=bytes / (1e4 + rnorm(21, 100, 3)) + 8
    )
    
    ggplot(data=my_data, aes(x=bytes, y=time)) +
        geom_point() +
        geom_line() +
        scale_x_log10("Message Size [Byte]", labels=label_bytes()) +
        scale_y_continuous("Round-Trip-Time [us]")
    

    Or, if you prefer to have IEC units (KiB = 2^10, MiB = 2 ^ 20, ...), specify labels=label_bytes(units = "auto_binary"). For the result, check out the second plot in the original answer below as the result is very similar.


    Original answer

    For bytes there is gdata::humanReadable. humanReadable supports both SI prefixes (1000 Byte = 1 KB) as well as the binary prefixes defined by the IEC (1024 Byte = 1 KiB).

    This function humanReadableLabs allows to customise the parameters and takes care of NA values:

    humanReadableLabs <- function(...) {
        function(x) {
            sapply(x, function(val) {
                if (is.na(val)) {
                    return("")
                } else {
                    return(
                        humanReadable(val, ...)
                    )
                }
            })
        }
    }
    

    Now it is straightforward to change the labels to use SI prefixes and "byte" as the unit:

    library(ggplot2)
    library(gdata)
    
    bytes <- 2^seq(0,20) + rnorm(21, 4, 2)
    
    my_data <- data.frame(
        bytes=as.integer(bytes),
        time=bytes / (1e4 + rnorm(21, 100, 3)) + 8
    )
    
    humanReadableLabs <- function(...) {...}
    
    ggplot(data=my_data, aes(x=bytes, y=time)) +
        geom_point() +
        geom_line() +
        scale_x_log10("Message Size [Byte]", labels=humanReadableLabs(standard="SI")) +
        scale_y_continuous("Round-Trip-Time [us]")
    

    IEC prefixes are plotted by omitting standard="SI". Note that the breaks would have to be specified as well to have well-legible values.

    ggplot(data=my_data, aes(x=bytes, y=time)) +
        geom_point() +
        geom_line() +
        scale_x_log10("Message Size [Byte]", labels=humanReadableLabs()) +
        scale_y_continuous("Round-Trip-Time [us]")
    

    0 讨论(0)
  • 2020-12-03 11:40

    I used library("sos"); findFn("{SI prefix}") to find the sitools package.

    Construct data:

    bytes <- 2^seq(0,20) + rnorm(21, 4, 2)
    time <- bytes/(1e4 + rnorm(21, 100, 3)) + 8
    my_data <- data.frame(time, bytes)
    

    Load packages:

    library("sitools")
    library("ggplot2")    
    

    Create the plot:

    (p <- ggplot(data=my_data, aes(x=bytes, y=time)) +
         geom_point() +
         geom_line() +
         scale_x_log10("Message Size [Byte]", labels=f2si) +
         scale_y_continuous("Round-Trip-Time [us]"))
    

    I'm not sure how this compares to your function, but at least someone else went to the trouble of writing it ...

    I modified your code style a little bit -- semicolons at the ends of lines are harmless but are generally the sign of a MATLAB or C coder ...

    edit: I initially defined a generic formatting function

    si_format <- function(...) {
        function(x) f2si(x,...)
    }
    

    following the format of (e.g) scales::comma_format, but that seems unnecessary in this case -- just part of the deeper ggplot2 magic that I don't fully understand.

    The OP's code gives what seems to me to be not quite the right answer: the rightmost axis tick is "1000K" rather than "1M" -- this can be fixed by changing the >1e6 test to >=1e6. On the other hand, f2si uses lower-case k -- I don't know whether K is wanted (wrapping the results in toupper() could fix this).

    OP results (si_vec):

    enter image description here

    My results (f2si):

    enter image description here

    0 讨论(0)
提交回复
热议问题