Say I fit a model in statsmodels
mod = smf.ols(\'dependent ~ first_category + second_category + other\', data=df).fit()
When I do mod
Based on a similar question for R, there are some other options that may help people. I was looking for a single number that captured the collinearity, and options include the determinant and condition number of the correlation matrix.
According to one of the R answers, determinant of the correlation matrix will "range from 0 (Perfect Collinearity) to 1 (No Collinearity)". I found the bounded range helpful.
Translated example for determinant:
import numpy as np
import pandas as pd
# Create a sample random dataframe
np.random.seed(321)
x1 = np.random.rand(100)
x2 = np.random.rand(100)
x3 = np.random.rand(100)
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
# Now create a dataframe with multicollinearity
multicollinear_df = df.copy()
multicollinear_df['x3'] = multicollinear_df['x1'] + multicollinear_df['x2']
# Compute both correlation matrices
corr = np.corrcoef(df, rowvar=0)
multicollinear_corr = np.corrcoef(multicollinear_df, rowvar=0)
# Compare the determinants
print np.linalg.det(corr) . # 0.988532159861
print np.linalg.det(multicollinear_corr) . # 2.97779797328e-16
And similarly, the condition number of the covariance matrix will approach infinity with perfect linear dependence.
print np.linalg.cond(corr) . # 1.23116253259
print np.linalg.cond(multicollinear_corr) . # 6.19985218873e+15