linear-regression

Graphing perpendicular offsets in a least squares regression plot in R

倾然丶 夕夏残阳落幕 提交于 2019-11-29 21:05:57
I'm interested in making a plot with a least squares regression line and line segments connecting the datapoints to the regression line as illustrated here in the graphic called perpendicular offsets: http://mathworld.wolfram.com/LeastSquaresFitting.html (from MathWorld - A Wolfram Web Resource: wolfram.com ) I have the plot and regression line done here: ## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html ## Disease severity as a function of temperature # Response variable, disease severity diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4

How to calculate the 95% confidence interval for the slope in a linear regression model in R

牧云@^-^@ 提交于 2019-11-29 20:15:35
Here is an exercise from Introductory Statistics with R: With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body weight of 70 kg? Give a 95% confidence interval for the slope of the line. rmr data set is in the 'ISwR' package. It looks like this: > rmr body.weight metabolic.rate 1 49.9 1079 2 50.8 1146 3 51.8 1115 4 52.6 1161 5 57.6 1325 6 61.4 1351 7 62.3 1402 8 64.9 1365 9 43.1 870 10 48.1 1372 11 52.2 1132 12 53.5 1172 13 55.0 1034 14 55.0 1155 15 56.0 1392 16

OLS Regression: Scikit vs. Statsmodels?

别来无恙 提交于 2019-11-29 19:39:16
Short version : I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients are all different by large amounts. This concerns me since the most likely problem is that I've made an error somewhere and now I don't feel confident in either output (since likely I have made one model incorrectly but don't know which one). Longer version : Because I don't know where the issue is, I don't know exactly which details to include, and including everything is probably too much. I

Running several linear regressions from a single dataframe in R

一世执手 提交于 2019-11-29 17:58:55
I have a dataset of export trade data for a single country with 21 columns. The first column indicates the years (1962-2014) while the other 20 are trading partners. I am trying to run linear regressions for the years column and each other column. I have tried the method recommended here: Running multiple, simple linear regressions from dataframe in R that entails using combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE) However this only yields the intercept for each pair which is less important to me than the slope of the regressions. Additionally I have tried to use my dataset

Can't get aggregate() work for regression by group

北慕城南 提交于 2019-11-29 17:41:54
I want to use aggregate with this custom function: #linear regression f-n CalculateLinRegrDiff = function (sample){ fit <- lm(value~ date, data = sample) diff(range(fit$fitted)) } dataset2 = aggregate(value ~ id + col, dataset, CalculateLinRegrDiff(dataset)) I receive the error: Error in get(as.character(FUN), mode = "function", envir = envir) : object 'FUN' of mode 'function' was not found What is wrong? Your syntax on using aggregate is wrong in the first place. Pass function CalculateLinRegrDiff not an evaluated one CalculateLinRegrDiff(dataset) to FUN argument. Secondly, you've chosen the

pyspark Linear Regression Example from official documentation - Bad results?

只谈情不闲聊 提交于 2019-11-29 16:49:47
I am planning to use Linear Regression in Spark. To get started, I checked out the example from the official documentation ( which you can find here ) I also found this question on stackoverflow , which is essentially the same question as mine. The answer suggest to tweak the step size, which I also tried to do, however the results are still as random as without tweaking the step size. The code I'm using looks like this: from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel # Load and parse the data def parsePoint(line): values = [float(x) for x in

Linear regression with interaction fails in the rms-package

时间秒杀一切 提交于 2019-11-29 16:30:46
I'm playing around with interaction in the formula. I wondered if it's possible to do a regression with interaction for one of the two dummy variables. This seems to work in regular linear regression using the lm() function but with the ols() function in the rms package the same formula fails. Anyone know why? Here's my example data(mtcars) mtcars$gear <- factor(mtcars$gear) regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars) summary(regular_lm) regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:I(gear == "4"), data=mtcars) summary(regular_lm) And now the rms example library(rms) dd <-

Prediction of 'mlm' linear model object from `lm()`

[亡魂溺海] 提交于 2019-11-29 16:15:11
I have three datasets: response - matrix of 5(samples) x 10(dependent variables) predictors - matrix of 5(samples) x 2(independent variables) test_set - matrix of 10(samples) x 10(dependent variables defined in response) response <- matrix(sample.int(15, size = 5*10, replace = TRUE), nrow = 5, ncol = 10) colnames(response) <- c("1_DV","2_DV","3_DV","4_DV","5_DV","6_DV","7_DV","8_DV","9_DV","10_DV") predictors <- matrix(sample.int(15, size = 7*2, replace = TRUE), nrow = 5, ncol = 2) colnames(predictors) <- c("1_IV","2_IV") test_set <- matrix(sample.int(15, size = 10*2, replace = TRUE), nrow =

Analysis using linear regression based on subgroups

耗尽温柔 提交于 2019-11-29 15:35:20
Assume I have data (t,y) , where I expect a linear dependency y(t) . Furthermore, there exist attributes to each observation par1, par2, par3 . Is there an algorithm or technique to decide, if (one or both or all of the parameters) are relevant for the fit or not? I tried leaps::regsubsets(y ~ t + par1 + par2 + par3, data = mydata, nbest = 10) but was not able to get the formula for the best fit. The final result should look like this if plotted. For data see below. Thus, I want the information Adding par1 and par2 gives the best fit The models are y_i = a_i * t_i + b_i with given a_i and b_i

Plot linear model in 3d with Matplotlib

霸气de小男生 提交于 2019-11-29 15:22:02
问题 I'm trying to create a 3d plot of a linear model fit for a data set. I was able to do this relatively easily in R, but I'm really struggling to do the same in Python. Here is what I've done in R: Here's what I've done in Python: from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import numpy as np import pandas as pd import statsmodels.formula.api as sm csv = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0) model = sm.ols(formula='Sales ~ TV