Variance Inflation Factor in Python

前端 未结 8 611
星月不相逢
星月不相逢 2020-12-22 23:04

I\'m trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:

a b c d
1 2 4 4
1 2 6 3
2 3 7 4
3 2 8 5
4 1 9 4
         


        
8条回答
  •  北荒
    北荒 (楼主)
    2020-12-22 23:34

    For future comers to this thread (like me):

    import numpy as np
    import scipy as sp
    
    a = [1, 1, 2, 3, 4]
    b = [2, 2, 3, 2, 1]
    c = [4, 6, 7, 8, 9]
    d = [4, 3, 4, 5, 4]
    
    ck = np.column_stack([a, b, c, d])
    cc = sp.corrcoef(ck, rowvar=False)
    VIF = np.linalg.inv(cc)
    VIF.diagonal()
    

    This code gives

    array([22.95,  3.  , 12.95,  3.  ])
    

    [EDIT]

    In response to a comment, I tried to use DataFrame as much as possible (numpy is required to invert a matrix).

    import pandas as pd
    import numpy as np
    
    a = [1, 1, 2, 3, 4]
    b = [2, 2, 3, 2, 1]
    c = [4, 6, 7, 8, 9]
    d = [4, 3, 4, 5, 4]
    
    df = pd.DataFrame({'a':a,'b':b,'c':c,'d':d})
    df_cor = df.corr()
    pd.DataFrame(np.linalg.inv(df.corr().values), index = df_cor.index, columns=df_cor.columns)
    

    The code gives

           a            b           c           d
    a   22.950000   6.453681    -16.301917  -6.453681
    b   6.453681    3.000000    -4.080441   -2.000000
    c   -16.301917  -4.080441   12.950000   4.080441
    d   -6.453681   -2.000000   4.080441    3.000000
    

    The diagonal elements give VIF.

提交回复
热议问题