linear-regression

python statsmodels linear regression

早过忘川 提交于 2019-12-24 06:23:14
问题 I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere... For starters: import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True) ng.head() This

python statsmodels linear regression

混江龙づ霸主 提交于 2019-12-24 06:22:48
问题 I am attempting to make a linear regression model based on pre project data and ultimately attempt to calculate some modeled data where I could compare pre/post project data... Can anyone tell me what the best proactice is else I maybe off in the weeds somewhere... For starters: import statsmodels.api as sm import numpy as np import pandas as pd import matplotlib.pyplot as plt ng = pd.read_csv('C:/Users/ngDataBaseline.csv', thousands=',', index_col='Date', parse_dates=True) ng.head() This

Least square optimization in R

房东的猫 提交于 2019-12-24 04:08:24
问题 I am wondering how one could solve the following problem in R. We have a v vector (of n elements) and a B matrix (of dimension m x n ). E.g: > v [1] 2 4 3 1 5 7 > B [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 1 5 5 3 4 [2,] 4 5 6 3 2 5 [3,] 3 7 5 1 7 6 I am looking for the m -long vector u such that sum( ( v - ( u %*% B) )^2 ) is minimized (i.e. minimizes the sum of squares). 回答1: You are describing linear regression, which can be done with the lm function: coefficients(lm(v~t(B)+0)) # t(B)1 t(B)2 t

Unable to get R-squared for test dataset

旧巷老猫 提交于 2019-12-24 01:27:49
问题 I am trying to learn a bit about different types of regression and I am hacking my way through the code sample below. library(magrittr) library(dplyr) # Polynomial degree 1 df=read.csv("C:\\path_here\\auto_mpg.csv",stringsAsFactors = FALSE) # Data from UCI df1 <- as.data.frame(sapply(df,as.numeric)) # Select key columns df2 <- df1 %>% select(cylinder,displacement,horsepower,weight,acceleration,year,mpg) df3 <- df2[complete.cases(df2),] smp_size <- floor(0.75 * nrow(df3)) # Split as train and

Rolling regression on irregular time series

混江龙づ霸主 提交于 2019-12-24 01:23:57
问题 Summary (tldr) I need to perform a rolling regression on an irregular time series (i.e. the interval may not even be periodic and go from 0, 1, 2, 3... to ...7, 20, 24, 28... ) that's simple numeric and does not necessarily require date/time, but the rolling window needs be by time. So if I have a timeseries that is irregularly sampled for 600 seconds and the window is 30, the regression is performed every 30 seconds, and not every 30 samples. I've read examples, and while I could replicate

rcs generates bad prediction in lm() models

亡梦爱人 提交于 2019-12-24 00:58:47
问题 I'm trying to reproduce this blog post on overfitting. I want to explore how a spline compares to the tested polynomials. My problem: Using the rcs() - restricted cubic splines - from the rms package I get very strange predictions when applying in regular lm(). The ols() works fine but I'm a little surprised by this strange behavior. Can someone explain to me what's happening? library(rms) p4 <- poly(1:100, degree=4) true4 <- p4 %*% c(1,2,-6,9) days <- 1:70 noise4 <- true4 + rnorm(100, sd=.5)

How to predict a new value using simple linear regression log(y)=b0+b1*log(x)

随声附和 提交于 2019-12-24 00:24:22
问题 How to predict a new given value of body using the ml2 model below, and interpret its output (new predicted output only, not model) Using Animals dataset from MASS package to build a simple linear regression model ml2<-lm(log(brain)~log(body),data=Animals) predict a new given body of 468 pred_body<-data.frame(body=c(468)) predict(ml2,new, interval="confidence") fit lwr upr 1 5.604506 4.897498 6.311513 But i am not so sure predicted y(brain) =5.6 or log(brain)=5.6? How could we get the

How to plot confidence bands for my weighted log-log linear regression?

橙三吉。 提交于 2019-12-23 23:18:11
问题 I need to plot an exponential species-area relationship using the exponential form of a weighted log-log linear model, where mean species number per location/Bank ( sb$NoSpec.mean ) is weighted by the variance in species number per year ( sb$NoSpec.var ). I am able to plot the fit, but have issues figuring out how to plot the confidence intervals around this fit. The following is the best I have come up with so far. Any advice for me? # Data df <- read.csv("YearlySpeciesCount_SizeGroups.csv")

How does Spark's StreamingLinearRegressionWithSGD work?

会有一股神秘感。 提交于 2019-12-23 18:49:38
问题 I am working on StreamingLinearRegressionWithSGD which has two methods trainOn and predictOn. This class has a model object that is updated as training data arrives in the stream specified in trainOn argument. Simultaneously It give prediction using same model. I want to know that how the model weights are updated and synchronized across workers/executors. Any link or reference will be helpful. Thanks. 回答1: There is no magic here. StreamingLinearAlgorithm keeps a mutable reference to the

Multi Collinearity for Categorical Variables

ε祈祈猫儿з 提交于 2019-12-23 16:10:54
问题 For Numerical/Continuous data, to detect Collinearity between predictor variables we use the Pearson's Correlation Coefficient and make sure that predictors are not correlated among themselves but are correlated with the response variable. But How can we detect multicollinearity if we have a dataset, where predictors are all categorical . I am sharing one dataset where I am trying to find out if predictor variables are correlated or not > A(Response Variable) B C D > Yes Yes Yes Yes > No Yes