regression

Multivariate (polynomial) best fit curve in python?

帅比萌擦擦* 提交于 2019-11-28 18:22:11
How do you calculate a best fit line in python, and then plot it on a scatterplot in matplotlib? I was I calculate the linear best-fit line using Ordinary Least Squares Regression as follows: from sklearn import linear_model clf = linear_model.LinearRegression() x = [[t.x1,t.x2,t.x3,t.x4,t.x5] for t in self.trainingTexts] y = [t.human_rating for t in self.trainingTexts] clf.fit(x,y) regress_coefs = clf.coef_ regress_intercept = clf.intercept_ This is multivariate (there are many x-values for each case). So, X is a list of lists, and y is a single list. For example: x = [[1,2,3,4,5], [2,2,4,4,5

Non-linear regression in C#

自古美人都是妖i 提交于 2019-11-28 17:53:57
I'm looking for a way to produce a non-linear (preferably quadratic) curve, based on a 2D data set, for predictive purposes. Right now I'm using my own implementation of ordinary least squares (OLS) to produce a linear trend, but my trends are much more suited to a curve model. The data I'm analysing is system load over time. Here's the equation that I'm using to produce my linear coefficients: I've had a look at Math.NET Numerics and a few other libs, but they either provide interpolation instead of regression (which is of no use to me), or the code just doesn't work in some way. Anyone know

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

删除回忆录丶 提交于 2019-11-28 17:09:58
问题 I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and their coefficients) to glm to get the information (Standard errors, etc). I searched r documents, it seems I can use "method" option in glm to specify user defined function. But I failed to do so, could someone help me with this? 回答1: "It is a very

Distinguishing overfitting vs good prediction

廉价感情. 提交于 2019-11-28 16:06:13
These are questions on how to calculate & reduce overfitting in machine learning. I think many new to machine learning will have the same questions, so I tried to be clear with my examples and questions in hope that answers here can help others. I have a very small sample of texts and I'm trying to predict values associated with them. I've used sklearn to calculate tf-idf, and insert those into a regression model for prediction. This gives me 26 samples with 6323 features - not a lot.. I know: >> count_vectorizer = CountVectorizer(min_n=1, max_n=1) >> term_freq = count_vectorizer.fit_transform

How to calculate the regularization parameter in linear regression

我怕爱的太早我们不能终老 提交于 2019-11-28 15:32:26
问题 When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in the cost function. This lambda is then used to update the theta parameters in the gradient descent algorithm. My question is how do we calculate this lambda regularization parameter? 回答1: The regularization parameter (lambda) is an input to your model so what you probably want to know is how do you

Screening (multi)collinearity in a regression model

你。 提交于 2019-11-28 15:07:40
I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect of individual predictors. One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R 2 , and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method"

How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

柔情痞子 提交于 2019-11-28 14:13:45
I want to do a regression in R using glm , but is there a way to do it since I get the contrasts error. mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12), WL=rep(c(1,0),12), New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA)) mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf) #Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : # contrasts can be applied only to factors with 2 or more levels Using

Fitting linear model / ANOVA by group [duplicate]

守給你的承諾、 提交于 2019-11-28 14:10:58
This question already has an answer here: Linear Regression and group by in R 10 answers I'm trying to run anova() in R and running into some difficulty. This is what I've done up to now to help shed some light on my question. Here is the str() of my data to this point. str(mhw) 'data.frame': 500 obs. of 5 variables: $ r : int 1 2 3 4 5 6 7 8 9 10 ... $ c : int 1 1 1 1 1 1 1 1 1 1 ... $ grain: num 3.63 4.07 4.51 3.9 3.63 3.16 3.18 3.42 3.97 3.4 ... $ straw: num 6.37 6.24 7.05 6.91 5.93 5.59 5.32 5.52 6.03 5.66 ... $ Quad : Factor w/ 4 levels "NE","NW","SE",..: 2 2 2 2 2 2 2 2 2 2 ... Column r

Highcharts - Get crossing point of crossing series

本小妞迷上赌 提交于 2019-11-28 12:59:58
I am currently trying to extract the points of multiple crossings of series (a,b,c,d) of a specific series (x). I can't seem to find any function that can aid me in this task. My best bet is to measure the distance of every single point in x with every single point in a,b,c,d... and assume when the distance reaches under some threshold, the point must be a crossing point. I think this approach is far too computational heavy and seems "dirty". I believe there must be easier or better ways, even perhaps functions within highcharts own API. I have searched various sources and sites, but I can't

ggplot2: Problem with x axis when adding regression line equation on each facet

旧时模样 提交于 2019-11-28 12:53:38
Based on the example here Adding Regression Line Equation and R2 on graph , I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis. library(ggplot2) library(reshape2) df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous() + stat_smooth_func(geom = 'text', method = 'lm', hjust = 0, parse = T) + geom_smooth(method = 'lm',