R - How can I use the apply functions instead of iterating?

本秂侑毒 提交于 2019-12-12 05:10:05

问题


Regress each dependent variable ( dep_var ) against independent variable ( ind_var )

I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.

When there is a missing observation (NA) , the entire row is not used for that particular regression.

I have done it by looping/iterating through each column of dependent variable.

fit = list()
for( i in 1 : 2 ) {
    fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[  , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
    }

Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?

Random Data

mydf = data.frame( 
"ID"    = rep( "A" , 25 ),
"Date"  = c( 1 : 25 ), 
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,          
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,          
"Ind_Var" = c( 75:51 )  )

My own attempt of converting will be:

apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x)  ) )

but this involves having mydf hardcoded.

I apologize if I have used any incorrect terms.


回答1:


What about the following approach

# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);

# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))

Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.

If you want to manually exclude the NAs you could use complete.cases inside the lapply function; this shouldn't be necessary because lm (by default) deals with NA's.


I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df, with predictors given in columns predIdx and the independent variable given in column respIndx.

one_at_a_time_LM <- function(df, predIdx, respIdx) {
    lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}

one_at_a_time_LM(mydf, c(3, 4), 5);


来源:https://stackoverflow.com/questions/46822631/r-how-can-i-use-the-apply-functions-instead-of-iterating

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!