linear-regression | 易学教程

python statsmodels linear regression

阅读更多关于 python statsmodels linear regression

问题 I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere... For starters: import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True) ng.head() This

python statsmodels linear regression

阅读更多关于 python statsmodels linear regression

Least square optimization in R

阅读更多关于 Least square optimization in R

问题 I am wondering how one could solve the following problem in R. We have a v vector (of n elements) and a B matrix (of dimension m x n ). E.g: > v [1] 2 4 3 1 5 7 > B [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 1 5 5 3 4 [2,] 4 5 6 3 2 5 [3,] 3 7 5 1 7 6 I am looking for the m -long vector u such that sum( ( v - ( u %*% B) )^2 ) is minimized (i.e. minimizes the sum of squares). 回答1: You are describing linear regression, which can be done with the lm function: coefficients(lm(v~t(B)+0)) # t(B)1 t(B)2 t

Unable to get R-squared for test dataset

阅读更多关于 Unable to get R-squared for test dataset

问题 I am trying to learn a bit about different types of regression and I am hacking my way through the code sample below. library(magrittr) library(dplyr) # Polynomial degree 1 df=read.csv("C:\\path_here\\auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI df1 <- as.data.frame(sapply(df,as.numeric)) # Select key columns df2 <- df1 %>% select(cylinder,displacement,horsepower,weight,acceleration,year,mpg) df3 <- df2[complete.cases(df2),] smp_size <- floor(0.75 * nrow(df3)) # Split as train and

Rolling regression on irregular time series

阅读更多关于 Rolling regression on irregular time series

问题 Summary (tldr) I need to perform a rolling regression on an irregular time series (i.e. the interval may not even be periodic and go from 0, 1, 2, 3... to ...7, 20, 24, 28... ) that's simple numeric and does not necessarily require date/time, but the rolling window needs be by time. So if I have a timeseries that is irregularly sampled for 600 seconds and the window is 30, the regression is performed every 30 seconds, and not every 30 samples. I've read examples, and while I could replicate

rcs generates bad prediction in lm() models

阅读更多关于 rcs generates bad prediction in lm() models

问题 I'm trying to reproduce this blog post on overfitting. I want to explore how a spline compares to the tested polynomials. My problem: Using the rcs() - restricted cubic splines - from the rms package I get very strange predictions when applying in regular lm(). The ols() works fine but I'm a little surprised by this strange behavior. Can someone explain to me what's happening? library(rms) p4 <- poly(1:100, degree=4) true4 <- p4 %*% c(1,2,-6,9) days <- 1:70 noise4 <- true4 + rnorm(100, sd=.5)

How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

阅读更多关于 How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

问题 How to predict a new given value of body using the ml2 model below, and interpret its output (new predicted output only, not model) Using Animals dataset from MASS package to build a simple linear regression model ml2<-lm(log(brain)~log(body),data=Animals) predict a new given body of 468 pred_body<-data.frame(body=c(468)) predict(ml2,new, interval="confidence") fit lwr upr 1 5.604506 4.897498 6.311513 But i am not so sure predicted y(brain) =5.6 or log(brain)=5.6? How could we get the

How to plot confidence bands for my weighted log-log linear regression?

阅读更多关于 How to plot confidence bands for my weighted log-log linear regression?

问题 I need to plot an exponential species-area relationship using the exponential form of a weighted log-log linear model, where mean species number per location/Bank ( sb$NoSpec.mean ) is weighted by the variance in species number per year ( sb$NoSpec.var ). I am able to plot the fit, but have issues figuring out how to plot the confidence intervals around this fit. The following is the best I have come up with so far. Any advice for me? # Data df <- read.csv("YearlySpeciesCount_SizeGroups.csv")

How does Spark's StreamingLinearRegressionWithSGD work?

阅读更多关于 How does Spark's StreamingLinearRegressionWithSGD work?

问题 I am working on StreamingLinearRegressionWithSGD which has two methods trainOn and predictOn. This class has a model object that is updated as training data arrives in the stream specified in trainOn argument. Simultaneously It give prediction using same model. I want to know that how the model weights are updated and synchronized across workers/executors. Any link or reference will be helpful. Thanks. 回答1: There is no magic here. StreamingLinearAlgorithm keeps a mutable reference to the

Multi Collinearity for Categorical Variables

阅读更多关于 Multi Collinearity for Categorical Variables

问题 For Numerical/Continuous data, to detect Collinearity between predictor variables we use the Pearson's Correlation Coefficient and make sure that predictors are not correlated among themselves but are correlated with the response variable. But How can we detect multicollinearity if we have a dataset, where predictors are all categorical . I am sharing one dataset where I am trying to find out if predictor variables are correlated or not > A(Response Variable) B C D > Yes Yes Yes Yes > No Yes