Automatically create formulas for all possible linear models

后端 未结 3 784
暗喜
暗喜 2020-11-30 07:34

Say I have a training set in a data frame train with columns ColA, ColB, ColC, etc. One of these columns designates a bin

相关标签:
3条回答
  • 2020-11-30 08:13

    Say we work with this ridiculous example :

    DF <- data.frame(Class=1:10,A=1:10,B=1:10,C=1:10)
    

    Then you get the names of the columns

    Cols <- names(DF)
    Cols <- Cols[! Cols %in% "Class"]
    n <- length(Cols)
    

    You construct all possible combinations

    id <- unlist(
            lapply(1:n,
                  function(i)combn(1:n,i,simplify=FALSE)
            )
          ,recursive=FALSE)
    

    You paste them to formulas

    Formulas <- sapply(id,function(i)
                  paste("Class~",paste(Cols[i],collapse="+"))
                )
    

    And you loop over them to apply the models.

    lapply(Formulas,function(i)
        lm(as.formula(i),data=DF))
    

    Be warned though: if you have more than a handful columns, this will quickly become very heavy on the memory and result in literally thousands of models. You have 2^n - 1 different models with n being the number of columns.

    Make very sure that is what you want, in general this kind of model comparison is strongly advised against. Forget about any kind of inference as well when you do this.

    0 讨论(0)
  • 2020-11-30 08:17

    Here is an excellent blog post by Mark Heckman, detailing how to construct all possible regression models, given a set of explanatory variables and a response variable. However, as pointed out by Joris, I would strictly caution against using such an approach since (a) the number of regressions increases exponentially and (b) statistical experts don't recommend data fishing of this kind, as it is fraught with all kinds of risks.

    0 讨论(0)
  • 2020-11-30 08:34
    vars<-c('a','b','c','d')
    library(gregmisc) 
    indexes<-unique(apply(combinations(length(vars), length(vars), repeats=T), 1, unique))
    gen.form<-function(x) as.formula(paste('~',paste( vars[x],collapse='+')))
    formulas<-lapply(indexes, gen.form)
    formulas
    

    Generates:

    R> formulas

    [[1]] ~a

    [[2]] ~a + b

    [[3]] ~a + c

    [[4]] ~a + d

    [[5]] ~a + b + c

    [[6]] ~a + b + d

    [[7]] ~a + c + d

    [[8]] ~a + b + c + d

    [[9]] ~b

    [[10]] ~b + c

    [[11]] ~b + d

    [[12]] ~b + c + d

    [[13]] ~c

    [[14]] ~c + d

    [[15]] ~d

    0 讨论(0)
提交回复
热议问题