statsmodels

Print OLS regression summary to text file

╄→гoц情女王★ 提交于 2021-02-18 05:56:41
问题 I am running OLS regression using pandas.stats.api.ols using a groupby with the following code: from pandas.stats.api import ols df=pd.read_csv(r'F:\file.csv') result=df.groupby(['FID']).apply(lambda d: ols(y=d.loc[:, 'MEAN'], x=d.loc[:, ['Accum_Prcp', 'Accum_HDD']])) for i in result: x=pd.DataFrame({'FID':i.index, 'delete':i.values}) frame = pd.concat([x,DataFrame(x['delete'].tolist())], axis=1, join='outer') del frame['delete'] print frame but this returns the error: AttributeError: 'OLS'

Compute a confidence interval from sample data assuming unknown distribution

让人想犯罪 __ 提交于 2021-02-12 11:32:07
问题 I have sample data which I would like to compute a confidence interval for, assuming a distribution is not normal and is unknown. Basically, it looks like distribution is Pareto but I don't know for sure. The answers for the normal distribution: Compute a confidence interval from sample data Correct way to obtain confidence interval with scipy 回答1: If you don't know the underlying distribution, then my first thought would be to use bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_

Compute a confidence interval from sample data assuming unknown distribution

女生的网名这么多〃 提交于 2021-02-12 11:31:10
问题 I have sample data which I would like to compute a confidence interval for, assuming a distribution is not normal and is unknown. Basically, it looks like distribution is Pareto but I don't know for sure. The answers for the normal distribution: Compute a confidence interval from sample data Correct way to obtain confidence interval with scipy 回答1: If you don't know the underlying distribution, then my first thought would be to use bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_

Statsmodels with partly identified model

两盒软妹~` 提交于 2021-02-11 13:19:42
问题 I am trying to run a regression where only some of the coefficients can be identified: data = np.array([[2, 1, 1, 1], [1, 1, 1, 0]]) df = pd.DataFrame(data, columns=['y', 'x1', 'x2', 'x3']) z = df.pop('y') mod = sm.OLS(z, sm.add_constant(df)) Now, I have two outcomes, and the only variables that changes between the two observations is x3 . So, I would expect that (since I added a constant), the model would be unable to identify x1 or x2 , and would omit those. It should however give me a 1

statsmodels: printing summary of more than one regression models together

怎甘沉沦 提交于 2021-02-07 04:16:24
问题 In the Python library Statsmodels , you can print out the regression results with print(results.summary()) , how can I print out the summary of more than one regressions in one table, for better comparison? A linear regression, code taken from statsmodels documentation: nsample = 100 x = np.linspace(0, 10, 100) X = np.column_stack((x, x**2)) beta = np.array([0.1, 10]) e = np.random.normal(size=nsample) y = np.dot(X, beta) + e model = sm.OLS(y, X) results_noconstant = model.fit() Then I add a

statsmodels: printing summary of more than one regression models together

时间秒杀一切 提交于 2021-02-07 04:15:17
问题 In the Python library Statsmodels , you can print out the regression results with print(results.summary()) , how can I print out the summary of more than one regressions in one table, for better comparison? A linear regression, code taken from statsmodels documentation: nsample = 100 x = np.linspace(0, 10, 100) X = np.column_stack((x, x**2)) beta = np.array([0.1, 10]) e = np.random.normal(size=nsample) y = np.dot(X, beta) + e model = sm.OLS(y, X) results_noconstant = model.fit() Then I add a

ARMA.predict for out-of sample forecast does not work with floating points?

让人想犯罪 __ 提交于 2021-02-01 04:59:20
问题 After i developed my little ARMAX-forecasting model for in-sample analysis i´d like to predict some data out of sample. The time series i use for forecasting calculation starts at 2013-01-01 and ends at 2013-12-31! Here is my data I am working with: hr = np.loadtxt("Data_2013_17.txt") index = date_range(start='2013-1-1', end='2013-12-31', freq='D') df = pd.DataFrame(hr, index=index) holidays = ['2013-1-1', '2013-3-29', '2013-4-1', '2013-5-1', '2013-5-9', '2013-5-20', '2013-10-3', '2013-12-25'

Plotly: How to find coefficient of trendline in plotly express?

a 夏天 提交于 2021-01-29 15:49:25
问题 How do you find the coefficient of the trend line in plotly express? For example I used the code below to chart the trend line but now I want to know the coefficient. import plotly.express as px px.scatter(df, x='x_data', y='y_data', trendline="ols") 回答1: Here you need to have a look at plotly doc in plotly and statsmodels one. I think the example in the plotly example should be fixed. Anyway import plotly.express as px df = px.data.tips() fig = px.scatter(df, x="total_bill", y="tip",

Kolmogorov Smirnov test for the fitting goodness in python

て烟熏妆下的殇ゞ 提交于 2021-01-29 05:35:27
问题 i am trying to fit distributions. The fitting is finished, but i need a measurement, to choose the best model. Many papers are using the Kolomogorov-Smirnov (KS) test. I tried to implement that, and i am getting very low p-value results. The implementation: #Histigram plot binwidth = np.arange(0,int(out_threshold1),1) n1, bins1, patches = plt.hist(h1, bins=binwidth, normed=1, facecolor='#023d6b', alpha=0.5, histtype='bar') #Fitting gevfit4 = gev.fit(h1) pdf_gev4 = gev.pdf(lnspc, *gevfit4) plt

How does statsmodels encode endog variables entered as strings?

时光怂恿深爱的人放手 提交于 2021-01-27 21:53:11
问题 I'm new to using statsmodels to do statistical analyses. I'm getting expected answers most of the time but there are some things I don't quite understand about the way that statsmodels defines endog (dependant) variables for logistic regression when entered as strings. An example Pandas dataframe to illustrate the issue can be defined as shown below. The yN, yA and yA2 columns represent different ways to define an endog variable: yN is a binary variable coded 0, 1; yA is a binary variable