patsy | 易学教程

Using ols function with parameters that contain numbers/spaces

阅读更多关于 Using ols function with parameters that contain numbers/spaces

问题 I am having a lot of difficulty using the statsmodels.formula.api function ols(formula,data).fit().rsquared_adj due to the nature of the names of my predictors. The predictors have numbers and spaces etc in them which it clearly doesn't like. I understand that I need to use something like patsy.builtins.Q So lets say my predictor would be weight.in.kg , it should be entered as follows: Q("weight.in.kg") so I need to take my formula from a list, and the difficulty arises in modifying every

Using ols function with parameters that contain numbers/spaces

阅读更多关于 Using ols function with parameters that contain numbers/spaces

Using ols function with parameters that contain numbers/spaces

阅读更多关于 Using ols function with parameters that contain numbers/spaces

Clustered standard errors in statsmodels with categorical variables (Python)

阅读更多关于 Clustered standard errors in statsmodels with categorical variables (Python)

问题 I want to run a regression in statsmodels that uses categorical variables and clustered standard errors. I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values. df.dropna() reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df) .fit(cov_type='cluster', cov_kwds={'groups': df['institution']}) I'm getting the following: ValueError: The weights

PatsyError: Number of rows mismatch between data argument and column (statsmodels)

阅读更多关于 PatsyError: Number of rows mismatch between data argument and column (statsmodels)

来源： https://stackoverflow.com/questions/58740329/patsyerror-number-of-rows-mismatch-between-data-argument-and-column-statsmodel

PatsyError: Number of rows mismatch between data argument and column (statsmodels)

阅读更多关于 PatsyError: Number of rows mismatch between data argument and column (statsmodels)

来源： https://stackoverflow.com/questions/58740329/patsyerror-number-of-rows-mismatch-between-data-argument-and-column-statsmodel

Patsy formula when variable has a hypthen

阅读更多关于 Patsy formula when variable has a hypthen

问题 I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example: +------+-------+-------+ + VOLT + B-NN + B-IDW + +------+-------+-------+ Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see

Patsy formula when variable has a hypthen

阅读更多关于 Patsy formula when variable has a hypthen

Patsy formula when variable has a hypthen

阅读更多关于 Patsy formula when variable has a hypthen

Patsy: New levels in categorical fields in test data

阅读更多关于 Patsy: New levels in categorical fields in test data

问题 I am trying to use Patsy (with sklearn, pandas) for creating a simple regression model. The R style formula creation is a major draw. My data contains a field called ' ship_city ' which can have any city from India. Since I am partitioning the data into train and test sets, there are several cities which appear only in one of the sets. A code snippet is given below: df_train_Y, df_train_X = dmatrices(formula, data=df_train, return_type='dataframe') df_train_Y_design_info, df_train_X_design