R - How can I use the apply functions instead of iterating?

问题

Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.

When there is a missing observation (NA) , the entire row is not used for that particular regression.

I have done it by looping/iterating through each column of dependent variable.

fit = list()
for( i in 1 : 2 ) {
    fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[  , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
    }

Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?

Random Data

mydf = data.frame( 
"ID"    = rep( "A" , 25 ),
"Date"  = c( 1 : 25 ), 
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,          
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,          
"Ind_Var" = c( 75:51 )  )

My own attempt of converting will be:

apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x)  ) )

but this involves having mydf hardcoded.

I apologize if I have used any incorrect terms.

回答1:

What about the following approach

# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);

# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))

Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.

If you want to manually exclude the NAs you could use complete.cases inside the lapply function; this shouldn't be necessary because lm (by default) deals with NA's.

I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df, with predictors given in columns predIdx and the independent variable given in column respIndx.

one_at_a_time_LM <- function(df, predIdx, respIdx) {
    lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}

one_at_a_time_LM(mydf, c(3, 4), 5);

来源：https://stackoverflow.com/questions/46822631/r-how-can-i-use-the-apply-functions-instead-of-iterating

标签

linear-regression