I\'m trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:
a b c d
1 2 4 4
1 2 6 3
2 3 7 4
3 2 8 5
4 1 9 4
Example for Boston Data:
VIF is calculated by auxiliary regression, so not dependent on the actual fit.
See below:
from patsy import dmatrices
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
# Break into left and right hand side; y and X
y, X = dmatrices(formula="medv ~ crim + zn + nox + ptratio + black + rm ", data=boston, return_type="dataframe")
# For each Xi, calculate VIF
vif = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
# Fit X to y
result = sm.OLS(y, X).fit()