R - iteratively apply a function of a list of variables

前端 未结 2 2069
南旧
南旧 2021-01-07 08:58

My goal is to create a function that, when looped over multiple variables of a data frame, will return a new data frame containing the percents and 95% confidence intervals

2条回答
  •  甜味超标
    2021-01-07 09:35

    The nice thing about all the functions you're using is that they are already vectorized (except sd and qt, but you can easily vectorize them for specific arguments with Vectorize). This means you can pass vectors to them without needing to write a single loop. I left out the parts of your function that deal with preparing the input and prettying up the output.

    t1.props <- function(var, data=mtcars) {
        N <- nrow(data)
        levels <- names(table(data[,var]))
        count <- unclass(table(data[,var]))        # counts
        prop <- count / N                          # proportions
        se <- sqrt(prop * (1-prop)/(N-1))          # standard errors of props.
        lprop <- log(prop) - log(1-prop)           # logged prop
        lse <- se / (prop*(1-prop))                # logged se
        stat <- Vectorize(qt, "df")(0.975, N-1)    # tstats
        llower <- lprop - stat*lse                 # log lower 
        lupper <- lprop + stat*lse                 # log upper
        lower <- exp(llower) / (1 + exp(llower))   # lower ci
        upper <- exp(lupper) / (1 + exp(lupper))   # upper ci
    
        data.frame(variable=var,
                   level=levels,
                   perc=100*prop,
                   lower=100*lower,
                   upper=100*upper)
    }
    

    So, the only explicit applying/looping comes when you apply the function to multiple variables as follows

    ## Apply your function to two variables
    do.call(rbind, lapply(c("cyl", "am"), t1.props))
    #   variable level   perc    lower    upper
    # 4      cyl     4 34.375 19.49961 53.11130
    # 6      cyl     6 21.875 10.34883 40.44691
    # 8      cyl     8 43.750 27.09672 61.94211
    # 0       am     0 59.375 40.94225 75.49765
    # 1       am     1 40.625 24.50235 59.05775
    

    As far as the loop in your code, it's not like that is particularly important in terms of efficiency, but you can see how much easier code can be to read when its concise - and apply functions offer a lot of simple one-line solutions.

    I think the most important thing to change in your code is the use of assign and get. Instead, you can store variables in lists or another data structure, and use setNames, names<-, or names(...) <- to name the components when needed.

提交回复
热议问题