Naming conflicts in R when using attach

前端 未结 2 1184
旧时难觅i
旧时难觅i 2021-01-16 12:28

I feel as if constantly in R, I get weird naming conflicts between attached dataframes and other objects, attaches/detaches not working as expected (just had two copies of t

2条回答
  •  失恋的感觉
    2021-01-16 12:37

    attaches/detaches (sic) not working as expected

    As mentioned by joran and BondedDust, using attach is always a bad idea, because it causes silly, obscure bugs like you found.

    naming dataframes with single letters

    Don't do this either! Give you variables meaningful names, so that your code is easier to understand when you come back to it six months later.


    If your problem is that you don't like repeatedly typing the name of a data frame to access columns, then use functions with special evaluation that avoid that need.

    For example,

    some_sample_data <- data.frame(x = 1:10, y = runif(10))
    

    Subsetting

    Repeated typing, hard work:

    some_sample_data[some_sample_data$x > 3 & some_sample_data$y > 0.5, ]
    

    Easier alternative using subset:

    subset(some_sample_data, x > 3 & y > 0.5)
    

    Reordering

    Repeated typing, hard work:

    order_y <- order(some_sample_data$y)
    some_sample_data[order_y, ]
    

    Easier using arrange from plyr:

    arrange(some_sample_data, y)
    

    Transforming

    Repeated typing, hard work:

    some_sample_data$z <- some_sample_data$x + some_sample_data$y
    

    Easier using with, within or mutate (the last one from plyr):

    some_sample_data$z <- with(some_sample_data, x + y)
    some_sample_data <- within(some_sample_data, z <- x + y)
    some_sample_data <- mutate(some_sample_data, z = x + y)
    

    Modelling

    As mentioned by MrFlick, many functions, particularly modelling functions, have a data argument that lets you avoid repeating the data name.

    Repeated typing, hard work:

    lm(some_sample_data$y ~ some_sample_data$x)
    

    Using a data argument:

    lm(y ~ x, data = some_sample_data)
    

    You can see all the functions in the stats package that have a data argument using:

    library(sig)
    stats_sigs <- list_sigs(pkg2env(stats))
    Filter(function(fn) "data" %in% names(fn$args), stats_sigs)
    

提交回复
热议问题