statsmodels

ANOVA in python using pandas dataframe with statsmodels or scipy?

我只是一个虾纸丫 提交于 2019-12-03 11:35:37
问题 I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day. In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense.

How to perform a chi-squared goodness of fit test using scientific libraries in Python?

南楼画角 提交于 2019-12-03 11:16:45
问题 Let's assume I have some data I obtained empirically: from scipy import stats size = 10000 x = 10 * stats.expon.rvs(size=size) + 0.2 * np.random.uniform(size=size) It is exponentially distributed (with some noise) and I want to verify this using a chi-squared goodness of fit (GoF) test. What is the simplest way of doing this using the standard scientific libraries in Python (e.g. scipy or statsmodels) with the least amount of manual steps and assumptions? I can fit a model with: param = stats

Print 'std err' value from statsmodels OLS results

给你一囗甜甜゛ 提交于 2019-12-03 10:01:41
(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs) I'm doing a linear regression using statsmodels , basically: import statsmodels.api as sm model = sm.OLS(y,x) results = model.fit() I know that I can print out the full set of results with: print results.summary() which outputs something like: OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.952 Model: OLS Adj. R-squared: 0.951 Method: Least Squares F-statistic: 972.9 Date: Mon, 20 Jul 2015 Prob (F-statistic): 5

How to have multiple groups in Python statsmodels linear mixed effects model?

半腔热情 提交于 2019-12-03 08:32:50
I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this. Here's the example. I have data that looks like the following (taken from here ): subject gender scenario attitude frequency F1 F 1 pol 213.3 F1 F 1 inf 204.5 F1 F 2 pol 285.1 F1 F 2 inf 259.7 F1 F 3 pol 203.9 F1 F 3 inf 286.9 F1 F 4 pol 250.8 F1 F 4 inf 276.8 I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am

Calculate logistic regression in python

元气小坏坏 提交于 2019-12-03 08:32:45
I tried to calculate logical regression. I have the data as csv file. it looks like node_id,second_major,gender,major_index,year,dorm,high_school,student_fac 0,0,2,257,2007,111,2849,1 1,0,2,271,2005,0,51195,2 2,0,2,269,2007,0,21462,1 3,269,1,245,2008,111,2597,1 .......................... This is my coding. import pandas as pd import statsmodels.api as sm import pylab as pl import numpy as np df = pd.read_csv("Reed98.csv") print df.describe() dummy_ranks = pd.get_dummies(df['second_major'], prefix='second_major') cols_to_keep = ['second_major', 'dorm', 'high_school'] data = df[cols_to_keep]

Plotting Historical Cointegration Values between two pairs

久未见 提交于 2019-12-03 08:27:31
Here is the sample ADF test in python to check for Cointegration between two pairs. However the final result gives only the numeric value for co-integration. How to get the historical results of Co-integration. Taken from http://www.leinenbock.com/adf-test-in-python/ import numpy as np import statsmodels.api as stat import statsmodels.tsa.stattools as ts x = np.random.normal(0,1, 1000) y = np.random.normal(0,1, 1000) def cointegration_test(y, x): result = stat.OLS(y, x).fit() return ts.adfuller(result.resid) I assume you want to test for expanding cointegration? Note that you should use sm.tsa

What to use to do multiple correlation?

血红的双手。 提交于 2019-12-03 07:17:16
I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible. edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also

Holt-Winters time series forecasting with statsmodels

帅比萌擦擦* 提交于 2019-12-03 07:03:05
I tried forecasting with holt-winters model as shown below but I keep getting a prediction that is not consistent with what I expect. I also showed a visualization of the plot Train = Airline[:130] Test = Airline[129:] from statsmodels.tsa.holtwinters import Holt y_hat_avg = Test.copy() fit1 = Holt(np.asarray(Train['Passengers'])).fit() y_hat_avg['Holt_Winter'] = fit1.predict(start=1,end=15) plt.figure(figsize=(16,8)) plt.plot(Train.index, Train['Passengers'], label='Train') plt.plot(Test.index,Test['Passengers'], label='Test') plt.plot(y_hat_avg.index,y_hat_avg['Holt_Winter'], label='Holt

Correct way to use ARMAResult.predict() function

狂风中的少年 提交于 2019-12-03 06:54:43
According to this question How to get constant term in AR Model with statsmodels and Python? . I'm now trying to use the ARMA model to fit the data but again I couldn't find a way to interpret the model's result. Here what I have done according to ARMA out-of-sample prediction with statsmodels and ARMAResults.predict API document . # Parameter INPUT_DATA_POINT = 200 P = 5 Q = 0 # Read Data data = [] f = open('stock_all.csv', 'r') for line in f: data.append(float(line.split(',')[5])) f.close() # Fit ARMA-model using the first piece of data result = arma_model(data[:INPUT_DATA_POINT], P, Q) #

How to get the regression intercept using Statsmodels.api

a 夏天 提交于 2019-12-03 06:13:13
I am trying calculate a regression output using python library but I am unabl;e to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis except the intercept. but when I use: from pandas.stats.api import ols My code for pandas: Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']]) print Regression I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels. the warning that I get while using pandas.stats.api: Warning (from