statsmodels

Highest Posterior Density Region and Central Credible Region

安稳与你 提交于 2019-12-03 04:40:53
问题 Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass. In other words, for a given α, we look for a p * that satisfies: and then obtain the Highest Posterior Density Region as the set: Central Credible Region: Using the same notation as above, a Credible Region (or interval) is defined as: Depending

Python 2.7 - statsmodels - formatting and writing summary output

混江龙づ霸主 提交于 2019-12-03 02:54:10
I'm doing logistic regression using pandas 0.11.0 (data handling) and statsmodels 0.4.3 to do the actual regression, on Mac OSX Lion. I'm going to be running ~2,900 different logistic regression models and need the results output to csv file and formatted in a particular way. Currently, I'm only aware of doing print result.summary() which prints the results (as follows) to the shell: Logit Regression Results ============================================================================== Dep. Variable: death_death No. Observations: 9752 Model: Logit Df Residuals: 9747 Method: MLE Df Model: 4

How to iterate over columns of pandas dataframe to run regression

时光总嘲笑我的痴心妄想 提交于 2019-12-03 01:33:06
问题 I'm sure this is simple, but as a complete newbie to python, I'm having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each. Here's what I'm doing: all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()}) returns = prices.pct_change() I know I can run a regression like this: regs =

ANOVA in python using pandas dataframe with statsmodels or scipy?

↘锁芯ラ 提交于 2019-12-03 01:09:17
I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day. In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense. Basically, what I want to do, is find the ANOVA breakdown of "Degrees" by three other columns. cphlewis I set

Highest Posterior Density Region and Central Credible Region

ε祈祈猫儿з 提交于 2019-12-02 18:05:42
Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior mass. In other words, for a given α, we look for a p * that satisfies: and then obtain the Highest Posterior Density Region as the set: Central Credible Region: Using the same notation as above, a Credible Region (or interval) is defined as: Depending on the distribution, there could be many such intervals. The central credible interval is defined as a

What's the difference between pandas ACF and statsmodel ACF?

牧云@^-^@ 提交于 2019-12-02 17:34:25
I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa . This is done in the following MWE: import pandas as pd from pandas_datareader import data import matplotlib.pyplot as plt import datetime from dateutil.relativedelta import relativedelta from statsmodels.tsa.stattools import acf, pacf ticker = 'AAPL' time_ago = datetime.datetime.today().date() - relativedelta(months = 6) ticker_data = data.get_data_yahoo(ticker, time_ago)['Adj Close'].pct_change().dropna

Why do I get only one parameter from a statsmodels OLS fit

▼魔方 西西 提交于 2019-12-02 15:50:58
Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y = numpy.array([1,2,3,4,5,6,7,8,9]) >>> X = numpy.array([1,1,2,2,3,3,4,4,5]) >>> res_ols = sm.OLS(y, X).fit() >>> res_ols.params array([ 1.82352941]) I had expected an array with two elements?!? The intercept and the slope coefficient? behzad.nouri Try this: X = sm.add_constant(X) sm.OLS(y,X) as in the documentation : An intercept is not included by default and

How to iterate over columns of pandas dataframe to run regression

橙三吉。 提交于 2019-12-02 13:51:43
I'm sure this is simple, but as a complete newbie to python, I'm having trouble figuring out how to iterate over variables in a pandas dataframe and run a regression with each. Here's what I'm doing: all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()}) returns = prices.pct_change() I know I can run a regression like this: regs = sm.OLS(returns.FIUIX,returns.FSTMX).fit() but suppose I want to do this for each column in the

How to get the prediction of test from 2D parameters of WLS regression in statsmodels

坚强是说给别人听的谎言 提交于 2019-12-02 05:51:49
I'm incrementally up the parameters of WLS regression functions using statsmodels. I have a 10x3 dataset X that I declared like this: X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]]) This is my dataset, and I have a 10x2 endog vector that looks like this: z = [[ 3.90311860e-322 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 2.00000000e+000] [ 0.00000000e+000 -2.00000000e+000] [ 0

ImportError: DLL load failed: when importing statsmodels [duplicate]

这一生的挚爱 提交于 2019-12-02 01:21:45
问题 This question already has answers here : Installing scipy in Python 3.5 on 32-bit Windows 7 Machine (4 answers) Closed 4 years ago . My Python version is 3.5 on win32. I successfully installed Numpy+MKL, Scipy and Statsmodels from here http://www.lfd.uci.edu/~gohlke/pythonlibs/ However, when I run import statsmodels as sm I get the following error: Traceback (most recent call last): File "D:\Python\Innovation\try\Try_Reg.py", line 6, in <module> import statsmodels as sm File "C:\Python35\lib