Detecting mulicollinear , or columns that have linear combinations while modelling in Python : LinAlgError

前端 未结 1 1417
刺人心
刺人心 2021-01-02 12:10

I am modelling data for a logit model with 34 dependent variables,and it keep throwing in the singular matrix error , as below -:

Traceback (most recent call         


        
相关标签:
1条回答
  • 2021-01-02 13:04

    Several points to this:

    You need tol > 0 to detect near perfect collinearity, which might also cause numerical problems in later calculations. Check the number of columns of A2 to see whether a column has really be dropped.

    Logit needs to do some non-linear calculations with the exog, so even if the design matrix is not very close to perfect collinearity, the transformed variables for the log-likelihood, derivative or Hessian calculations might still end up being with numerical problems, like singular Hessian.

    (All these are floating point problems when we work near floating point precision, 1e-15, 1e-16. There are sometimes differences in the default thresholds for matrix_rank and similar linalg functions which can imply that in some edge cases one function identifies it as singular and another one doesn't.)

    The default optimization method for the discrete models including Logit is a simple Newton method, which is fast in reasonably nice cases, but can fail in cases that are badly conditioned. You could try one of the other optimizers which will be one of those in scipy.optimize, method='nm' is usually very robust but slow, method='bfgs' works well in many cases but also can run into convergence problems.

    Nevertheless, even when one of the other optimization methods succeeds, it is still necessary to inspect the results. More often than not, a failure with one method means that the model or estimation problem might not be well defined.

    A good way to check whether it is just a problem with bad starting values or a specification problem is to run method='nm' first and then run one of the more accurate methods like newton or bfgs using the nm estimate as starting value, and see whether it succeeds from good starting values.

    0 讨论(0)
提交回复
热议问题