linear-regression

Linear regression with matplotlib / numpy

别等时光非礼了梦想. 提交于 2019-11-27 06:07:28
I'm trying to generate a linear regression on a scatter plot I have generated, however my data is in list format, and all of the examples I can find of using polyfit require using arange . arange doesn't accept lists though. I have searched high and low about how to convert a list to an array and nothing seems clear. Am I missing something? Following on, how best can I use my list of integers as inputs to the polyfit ? here is the polyfit example I am following: from pylab import * x = arange(data) y = arange(data) m,b = polyfit(x, y, 1) plot(x, y, 'yo', x, m*x+b, '--k') show() DSM arange

lme4::lmer reports “fixed-effect model matrix is rank deficient”, do I need a fix and how to?

∥☆過路亽.° 提交于 2019-11-27 04:06:18
I am trying to run a mixed-effects model that predicts F2_difference with the rest of the columns as predictors, but I get an error message that says fixed-effect model matrix is rank deficient so dropping 7 columns / coefficients. From this link, Fixed-effects model is rank deficient , I think I should use findLinearCombos in the R package caret . However, when I try findLinearCombos(data.df) , it gives me the error message Error in qr.default(object) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In qr.default(object) : NAs introduced by coercion My data does not

Why does lm run out of memory while matrix multiplication works fine for coefficients?

让人想犯罪 __ 提交于 2019-11-27 03:20:28
问题 I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy

Linear Regression and storing results in data frame [duplicate]

不打扰是莪最后的温柔 提交于 2019-11-27 02:29:44
问题 This question already has an answer here: Linear Regression and group by in R 10 answers I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible. Here's a sample of what I'm trying to do: a<- c("a","a","a","a","a", "b","b","b","b","b", "c","c","c","c","c")

How to return predicted values,residuals,R square from lm.fit in R?

不想你离开。 提交于 2019-11-27 02:23:34
问题 this piece of code will return coefficients :intercept , slop1 , slop2 set.seed(1) n=10 y=rnorm(n) x1=rnorm(n) x2=rnorm(n) lm.ft=function(y,x1,x2) return(lm(y~x1+x2)$coef) res=list(); for(i in 1:n){ x1.bar=x1-x1[i] x2.bar=x2-x2[i] res[[i]]=lm.ft(y,x1.bar,x2.bar) } If I type: > res[[1]] I get: (Intercept) x1 x2 -0.44803887 0.06398476 -0.62798646 How can we return predicted values,residuals,R square, ..etc? I need something general to extract whatever I need from the summary? 回答1: There are a

Accuracy Score ValueError: Can't Handle mix of binary and continuous target

浪子不回头ぞ 提交于 2019-11-27 00:56:04
I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is my true Data : array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0]) My predicted Data: array([ 0.07094605, 0.1994941 , 0.19270157, 0.13379635, 0.04654469, 0.09212494, 0.19952108, 0.12884365, 0.15685076, -0.01274453, 0.32167554, 0.32167554, -0.10023553, 0.09819648, -0.06755516, 0.25390082, 0.17248324]) My code: accuracy_score(y_true, y_pred, normalize=False) Error message: ValueError: Can't

Loop linear regression and saving coefficients

扶醉桌前 提交于 2019-11-26 23:25:54
问题 This is part of the dataset (named "ME1") I'm using (all variables are numeric): Year AgeR rateM 1 1751 -1.0 0.241104596 2 1751 -0.9 0.036093609 3 1751 -0.8 0.011623734 4 1751 -0.7 0.006670552 5 1751 -0.6 0.006610552 6 1751 -0.5 0.008510828 7 1751 -0.4 0.009344041 8 1751 -0.3 0.011729740 9 1751 -0.2 0.010988005 10 1751 -0.1 0.015896107 11 1751 0.0 0.018190140 12 1751 0.1 0.024588340 13 1751 0.2 0.029801362 14 1751 0.3 0.044515912 15 1751 0.4 0.055240354 16 1751 0.5 0.088476758 17 1751 0.6 0

lm(): What is qraux returned by QR decomposition in LINPACK / LAPACK

两盒软妹~` 提交于 2019-11-26 22:09:31
问题 rich.main3 is a linear model in R. I understand the rest of the elements of the list but I don't get what qraux is. The documentation states that it is a vector of length ncol(x) which contains additional information on \bold{Q}". What additional information does it mean? str(rich.main3$qr) qr : num [1:164, 1:147] -12.8062 0.0781 0.0781 0.0781 0.0781 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:164] "1" "2" "3" "4" ... .. ..$ : chr [1:147] "(Intercept)" "S2" "S3" "x1" ... ..- attr(*,

predict.lm() with an unknown factor level in test data

99封情书 提交于 2019-11-26 22:03:30
I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model, all of predict.lm() fails and returns an error. Is there a good way to have predict.lm() return a prediction for those factor levels the model knows and NA for unknown factor levels, instead of only an error? Example code: foo <- data.frame(response=rnorm(3),predictor=as.factor(c("A","B","C"))) model <- lm(response~predictor,foo) foo.new <- data.frame(predictor=as.factor(c("A","B","C","D"))) predict(model,newdata=foo.new) I would like the very last

R: lm() result differs when using `weights` argument and when using manually reweighted data

家住魔仙堡 提交于 2019-11-26 20:22:33
问题 In order to correct heteroskedasticity in error terms, I am running the following weighted least squares regression in R : #Call: #lm(formula = a ~ q + q2 + b + c, data = mydata, weights = weighting) #Weighted Residuals: # Min 1Q Median 3Q Max #-1.83779 -0.33226 0.02011 0.25135 1.48516 #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -3.939440 0.609991 -6.458 1.62e-09 *** #q 0.175019 0.070101 2.497 0.013696 * #q2 0.048790 0.005613 8.693 8.49e-15 *** #b 0.473891 0.134918 3