regression | 易学教程

Constrained regression in Python

阅读更多关于 Constrained regression in Python

问题 I have this simple regression model: y = a + b * x + c * z + error with a constraint on parameters: c = b - 1 There are similar questions posted on SO (like Constrained Linear Regression in Python). However, the constraints' type is lb <= parameter =< ub . What are the available options to handle this specific constrained linear regression problem? 回答1: This is how it can be done using GLM: import statsmodels import statsmodels.api as sm import numpy as np # Set the link function to identity

Error in a bivariate logistic model in R

阅读更多关于 Error in a bivariate logistic model in R

问题 I have an unexpected error in my research. Let me show you several code chunks from my research. Hope, you'll help me. I have two binary variables: alco and smoke that were generated like this: smoke<- factor(with(df, ifelse((q34<2),1,0))) alco<-factor(with(df, ifelse((q47==1), 1,0))) df<- cbind(df, smoke, alco, educ_3, smoke_14) I tried to analyse a model using zeligverse package m3<-zelig(cbind(smoke,alco) ~ fem+age+age2+smoke_14+ninc, model = "blogit", data = df) that lead to the mistake

Regression using Python

阅读更多关于 Regression using Python

问题 I have the following variables: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split np.random.seed(0) n = 15 x = np.linspace(0,10,n) + np.random.randn(n)/5 y = np.sin(x)+x/6 + np.random.randn(n)/10 X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0) def part1_scatter(): %matplotlib notebook plt.figure() plt.scatter(X_train, y_train, label='training data') plt.scatter(X_test, y_test, label='test data'

Is there an implementation of loess in R with more than 3 parametric predictors or a trick to a similar effect?

阅读更多关于 Is there an implementation of loess in R with more than 3 parametric predictors or a trick to a similar effect?

问题 Calling all experts on local regression and/or R ! I have run into a limitation of the standard loess function in R and hope you have some advice. The current implementation supports only 1-4 predictors . Let me set out our application scenario to show why this can easily become a problem as soon as we want to employ globally fit parametric covariables . Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z : z_i = s(x_i,y_i) + v_{g_i} These measurements z

Easily performing the same regression on different datasets

阅读更多关于 Easily performing the same regression on different datasets

问题 I'm performing the same regression on several different datasets (same dependent and independe variables). However, there are many independent variables, and I often want to test adding/removing different variables. I'd like to avoid making all these changes to different lines of code, just because they use different datasets. Can I instead just copy the formula that was used to create some object, and then create a new object using a different dataset? For example, something like: fit1 <- lm

ggplot2: How to plot an orthogonal regression line?

阅读更多关于 ggplot2: How to plot an orthogonal regression line?

问题 I have tested a large sample of participants on two different tests of visual perception – now, I'd like to see to what extent performance on both tests correlates. To visualise the correlation, I plot a scatterplot in R using ggplot() and I fit a regression line (using stat_smooth() ). However, since both my x and y variable are performance measures, I need to take both of them into account when fitting my regression line – thus, I cannot use a simple linear regression (using stat_smooth

Python Statsmodels: OLS regressor not predicting

阅读更多关于 Python Statsmodels: OLS regressor not predicting

问题 I wrote the following piece of code but I just cannot get the 'predict' method to work: import statsmodels.api as sm from statsmodels.formula.api import ols ols_model = ols('Consumption ~ Disposable_Income', df).fit() My 'df' is a pandas dataframe with column headings 'Consumption' and 'Disposable_Income'. When I run, for example, ols_model.predict([1000.0]) I get: "TypeError: list indices must be integers, not str" When I run, for example, ols_model.predict(df['Disposable_Income'].values) I

Unable to get R-squared for test dataset

阅读更多关于 Unable to get R-squared for test dataset

问题 I am trying to learn a bit about different types of regression and I am hacking my way through the code sample below. library(magrittr) library(dplyr) # Polynomial degree 1 df=read.csv("C:\\path_here\\auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI df1 <- as.data.frame(sapply(df,as.numeric)) # Select key columns df2 <- df1 %>% select(cylinder,displacement,horsepower,weight,acceleration,year,mpg) df3 <- df2[complete.cases(df2),] smp_size <- floor(0.75 * nrow(df3)) # Split as train and

How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

阅读更多关于 How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

问题 How to predict a new given value of body using the ml2 model below, and interpret its output (new predicted output only, not model) Using Animals dataset from MASS package to build a simple linear regression model ml2<-lm(log(brain)~log(body),data=Animals) predict a new given body of 468 pred_body<-data.frame(body=c(468)) predict(ml2,new, interval="confidence") fit lwr upr 1 5.604506 4.897498 6.311513 But i am not so sure predicted y(brain) =5.6 or log(brain)=5.6? How could we get the

How to plot confidence bands for my weighted log-log linear regression?

阅读更多关于 How to plot confidence bands for my weighted log-log linear regression?

问题 I need to plot an exponential species-area relationship using the exponential form of a weighted log-log linear model, where mean species number per location/Bank ( sb$NoSpec.mean ) is weighted by the variance in species number per year ( sb$NoSpec.var ). I am able to plot the fit, but have issues figuring out how to plot the confidence intervals around this fit. The following is the best I have come up with so far. Any advice for me? # Data df <- read.csv("YearlySpeciesCount_SizeGroups.csv")