patsy

Using ols function with parameters that contain numbers/spaces

寵の児 提交于 2021-02-07 18:14:32
问题 I am having a lot of difficulty using the statsmodels.formula.api function ols(formula,data).fit().rsquared_adj due to the nature of the names of my predictors. The predictors have numbers and spaces etc in them which it clearly doesn't like. I understand that I need to use something like patsy.builtins.Q So lets say my predictor would be weight.in.kg , it should be entered as follows: Q("weight.in.kg") so I need to take my formula from a list, and the difficulty arises in modifying every

Using ols function with parameters that contain numbers/spaces

荒凉一梦 提交于 2021-02-07 18:09:26
问题 I am having a lot of difficulty using the statsmodels.formula.api function ols(formula,data).fit().rsquared_adj due to the nature of the names of my predictors. The predictors have numbers and spaces etc in them which it clearly doesn't like. I understand that I need to use something like patsy.builtins.Q So lets say my predictor would be weight.in.kg , it should be entered as follows: Q("weight.in.kg") so I need to take my formula from a list, and the difficulty arises in modifying every

Using ols function with parameters that contain numbers/spaces

痞子三分冷 提交于 2021-02-07 18:09:22
问题 I am having a lot of difficulty using the statsmodels.formula.api function ols(formula,data).fit().rsquared_adj due to the nature of the names of my predictors. The predictors have numbers and spaces etc in them which it clearly doesn't like. I understand that I need to use something like patsy.builtins.Q So lets say my predictor would be weight.in.kg , it should be entered as follows: Q("weight.in.kg") so I need to take my formula from a list, and the difficulty arises in modifying every

Clustered standard errors in statsmodels with categorical variables (Python)

让人想犯罪 __ 提交于 2021-01-24 07:25:23
问题 I want to run a regression in statsmodels that uses categorical variables and clustered standard errors. I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values. df.dropna() reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df) .fit(cov_type='cluster', cov_kwds={'groups': df['institution']}) I'm getting the following: ValueError: The weights

Patsy formula when variable has a hypthen

筅森魡賤 提交于 2020-07-23 07:24:26
问题 I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example: +------+-------+-------+ + VOLT + B-NN + B-IDW + +------+-------+-------+ Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see

Patsy formula when variable has a hypthen

∥☆過路亽.° 提交于 2020-07-23 07:24:03
问题 I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example: +------+-------+-------+ + VOLT + B-NN + B-IDW + +------+-------+-------+ Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see

Patsy formula when variable has a hypthen

别来无恙 提交于 2020-07-23 07:22:05
问题 I am trying to use the statsmodel linear regression functions with formulas. My sample data is coming from a Pandas data frame. I am having a slight problem with column names within the formula. Due to the downstream processes, I have hyphens within my column names. For example: +------+-------+-------+ + VOLT + B-NN + B-IDW + +------+-------+-------+ Now, one of the reasons for keeping the hyphen as it allows python to split the string for other analysis, so I have to keep it. As you can see

Patsy: New levels in categorical fields in test data

对着背影说爱祢 提交于 2019-12-23 07:48:41
问题 I am trying to use Patsy (with sklearn, pandas) for creating a simple regression model. The R style formula creation is a major draw. My data contains a field called ' ship_city ' which can have any city from India. Since I am partitioning the data into train and test sets, there are several cities which appear only in one of the sets. A code snippet is given below: df_train_Y, df_train_X = dmatrices(formula, data=df_train, return_type='dataframe') df_train_Y_design_info, df_train_X_design