How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

前端 未结 2 1536
借酒劲吻你
借酒劲吻你 2020-12-18 01:09

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/

2条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-18 02:05

    Step 1:

    You need to control de-parsing options:

    dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R") 
    

    You can read more on all possible options in ?.deparseOpts.


    The "quoteExpressions" wraps all calls / expressions / languages with quote, so that they are not evaluated when you later re-parse it. Note:

    • source is doing parsing;
    • call field in your fitted "lm" object is a call:

      fit$call
      # lm(formula = z ~ x, data = dat_train)
      

    So, without "quoteExpressions", R will try to evaluate lm call during parsing. And if we evaluate it, it is fitting a linear model, and R will aim to find dat_train, which will not exist in your new R session.


    The "showAttributes" is another mandatory option, as "lm" object has class attributes. You certainly don't want to discard all class attributes and only export a plain "list" object, right? Moreover, many elements in a "lm" object, like model (the model frame), qr (the compact QR matrix) and terms (terms info), etc all have attributes. You want to keep them all.


    If you don't set control, the default setting with:

    control = c("keepNA", "keepInteger", "showAttributes")
    

    will be used. As you can see, there is no "quoteExpressions", so you will get into trouble.

    You can also specify "keepInteger" and "keepNA", but I don't see the need for "lm" object.

    ------

    Step 2:

    The above step will get source working correctly. You can recover your model:

    fit1 <- source("model.R")$value
    

    However, it is not yet ready for generic functions like summary and predict to work. Why?

    The critical issue is the terms object in fit1 is not really a "terms" object, but only a formula (it is even not a formula, but only a "language" object without "formula" class!). Just compare fit$terms and fit1$terms, and you will see the difference. Don't be surprised; we've set "quoteExpressions" earlier. While that is definitely helpful to prevent evaluation of call, it has side-effect for terms. So we need to reconstruct terms as best as we can.

    Fortunately, it is sufficient to do:

    fit1$terms <- terms.formula(fit1$terms)
    

    Though this still does not recover all information in fit$terms (like variable classes are missing), it is readily a valid "terms" object.

    Why is a "terms" object critical? Because all generic functions rely on it. You may not need to know more on this, as it is really technical, so I will stop here.

    Once this is done, we can successfully use predict (and summary, too):

    predict(fit1)  ## no `newdata` given, using model frame `fit1$model`
    #   1    2    3    4 
    #1.03 2.01 2.99 3.97 
    
    predict(fit1, dat_score)  ## with `newdata`
    #   1    2 
    #1.52 3.48 
    

    -------

    Conclusion remark:

    Although I have shown you how to get things work, I don't really recommend you doing this in general. An "lm" object will be pretty large when you fit a model to a large dataset, for example, residuals, fitted.values are long vectors, and qr and model are huge matrices / data frames. So think about this.

提交回复
热议问题