statsmodels

Getting the regression line to plot from a Pandas regression

时间秒杀一切 提交于 2019-11-29 11:22:51
I have tried with both the (pandas)pd.ols and the (statsmodels)sm.ols to get a regression scatter plot with the regression line , I can get the scatter plot but I can't seem to get the parameters to get the regression line to plot. It is probably obvious that I am doing some cut and paste coding here :-( (using this as a guide: http://nbviewer.ipython.org/github/weecology/progbio/blob/master/ipynbs/statistics.ipynb My data is in a pandas DataFrame and the x column is merged2[:-1].lastqu and the y data column is merged2[:-1].Units My code is now as follows: to get the regression: def fit_line2

Using describe() with weighted data — mean, standard deviation, median, quantiles

柔情痞子 提交于 2019-11-29 10:26:40
I'm fairly new to python and pandas (from using SAS as my workhorse analytical platform), so I apologize in advance if this has already been asked / answered. (I've searched through the documentation as well as this site searching for answer and haven't been able to find something yet.) I've got a dataframe (called resp) containing respondent level survey data. I want to perform some basic descriptive statistics on one of the fields (called anninc [short for annual income]). resp["anninc"].describe() Which gives me the basic stats: count 76310.000000 mean 43455.874862 std 33154.848314 min 0

Why would R-Squared decrease when I add an exogenous variable in OLS using python statsmodels

扶醉桌前 提交于 2019-11-29 07:32:01
If I understand the OLS model correctly, this should never be the case? trades['const']=1 Y = trades['ret']+trades['comms'] #X = trades[['potential', 'pVal', 'startVal', 'const']] X = trades[['potential', 'pVal', 'startVal']] from statsmodels.regression.linear_model import OLS ols=OLS(Y, X) res=ols.fit() res.summary() If I turn the const on, I get a rsquared of 0.22 and with it off, I get 0.43. How is that even possible? Josef see the answer here Statsmodels: Calculate fitted values and R squared Rsquared follows a different definition depending on whether there is a constant in the model or

Plotting confidence and prediction intervals with repeated entries

坚强是说给别人听的谎言 提交于 2019-11-29 02:34:10
I have a correlation plot for two variables, the predictor variable (temperature) on the x-axis, and the response variable (density) on the y-axis. My best fit least squares regression line is a 2nd order polynomial. I would like to also plot confidence and prediction intervals. The method described in this answer seems perfect. However, my dataset (n=2340) has repeated entries for many (x,y) pairs. My resulting plot looks like this: Here is my relevant code (slightly modified from linked answer above): import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels

python stats models - quadratic term in regression

给你一囗甜甜゛ 提交于 2019-11-29 02:32:54
问题 I have the following linear regression: import statsmodels.formula.api as sm model = sm.ols(formula = 'a ~ b + c', data = data).fit() I want to add a quadratic term for b in this model. Is there a simple way to do this with statsmodels.ols? Is there a better package I should be using to achieve this? 回答1: Although the solution by Alexander is working, in some situations it is not very convenient. For example, each time you want to predict the outcome of the model for new values, you need to

Statsmodels: Calculate fitted values and R squared

雨燕双飞 提交于 2019-11-29 02:17:21
I am running a regression as follows ( df is a pandas dataframe): import statsmodels.api as sm est = sm.OLS(df['p'], df[['e', 'varA', 'meanM', 'varM', 'covAM']]).fit() est.summary() Which gave me, among others, an R-squared of 0.942 . So then I wanted to plot the original y-values and the fitted values. For this, I sorted the original values: orig = df['p'].values fitted = est.fittedvalues.values args = np.argsort(orig) import matplotlib.pyplot as plt plt.plot(orig[args], 'bo') plt.plot(orig[args]-resid[args], 'ro') plt.show() This, however, gave me a graph where the values were completely off

Pandas rolling regression: alternatives to looping

一世执手 提交于 2019-11-28 22:40:24
I got good use out of pandas' MovingOLS class (source here ) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked several times ( here , for instance), but phrased a little broadly and left without a great answer, in my view. Here are my questions: How can I best mimic the basic framework of pandas' MovingOLS ? The most attractive feature of this class was the ability to view multiple methods/attributes as separate time series--i.e. coefficients, r-squared, t

Variance Inflation Factor in Python

℡╲_俬逩灬. 提交于 2019-11-28 21:58:08
问题 I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the vif function from the usdm library which gives the following results: a <- c(1, 1, 2, 3, 4) b <- c(2, 2, 3, 2, 1) c <- c(4, 6, 7, 8, 9) d <- c(4, 3, 4, 5, 4) df <- data.frame(a, b, c, d) vif_df <- vif(df) print(vif_df) Variables VIF a 22.95 b 3.00 c 12.95 d 3.00 However, when I do the same in python

Where can I find mad (mean absolute deviation) in scipy?

ε祈祈猫儿з 提交于 2019-11-28 17:05:38
问题 It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473 However, I can not find it anywhere in current versions of scipy. Of course it is possible to just copy the old code from repository but I prefer to use scipy's version. Where can I find it, or has it been replaced or removed? 回答1: The current version of statsmodels has mad in statsmodels.robust : >>>

What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?

一世执手 提交于 2019-11-28 16:25:46
I have tried looking through multiple statistics modules for Python but can't seem to find any that support one-way ANOVA post hoc tests. one way ANOVA can be used like from scipy import stats f_value, p_value = stats.f_oneway(data1, data2, data3, data4, ...) This is one way ANOVA and it returns F value and P value. There is significant difference If the P value is below your setting. The Tukey-kramer HSD test can be used like from statsmodels.stats.multicomp import pairwise_tukeyhsd print pairwise_tukeyhsd(Data, Group) This is multicomparison. The output is like Multiple Comparison of Means -