regression | 易学教程

Multivariate (polynomial) best fit curve in python?

阅读更多关于 Multivariate (polynomial) best fit curve in python?

How do you calculate a best fit line in python, and then plot it on a scatterplot in matplotlib? I was I calculate the linear best-fit line using Ordinary Least Squares Regression as follows: from sklearn import linear_model clf = linear_model.LinearRegression() x = [[t.x1,t.x2,t.x3,t.x4,t.x5] for t in self.trainingTexts] y = [t.human_rating for t in self.trainingTexts] clf.fit(x,y) regress_coefs = clf.coef_ regress_intercept = clf.intercept_ This is multivariate (there are many x-values for each case). So, X is a list of lists, and y is a single list. For example: x = [[1,2,3,4,5], [2,2,4,4,5

Non-linear regression in C#

阅读更多关于 Non-linear regression in C#

I'm looking for a way to produce a non-linear (preferably quadratic) curve, based on a 2D data set, for predictive purposes. Right now I'm using my own implementation of ordinary least squares (OLS) to produce a linear trend, but my trends are much more suited to a curve model. The data I'm analysing is system load over time. Here's the equation that I'm using to produce my linear coefficients: I've had a look at Math.NET Numerics and a few other libs, but they either provide interpolation instead of regression (which is of no use to me), or the code just doesn't work in some way. Anyone know

Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

阅读更多关于 Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?

问题 I have a regression model with binary outcome. I fitted the model with glmnet and got the selected variables and their coefficients. Since glmnet doesn't calculate variable importance, I would like to feed the exact output (selected variables and their coefficients) to glm to get the information (Standard errors, etc). I searched r documents, it seems I can use "method" option in glm to specify user defined function. But I failed to do so, could someone help me with this? 回答1: "It is a very

Distinguishing overfitting vs good prediction

阅读更多关于 Distinguishing overfitting vs good prediction

These are questions on how to calculate & reduce overfitting in machine learning. I think many new to machine learning will have the same questions, so I tried to be clear with my examples and questions in hope that answers here can help others. I have a very small sample of texts and I'm trying to predict values associated with them. I've used sklearn to calculate tf-idf, and insert those into a regression model for prediction. This gives me 26 samples with 6323 features - not a lot.. I know: >> count_vectorizer = CountVectorizer(min_n=1, max_n=1) >> term_freq = count_vectorizer.fit_transform

How to calculate the regularization parameter in linear regression

阅读更多关于 How to calculate the regularization parameter in linear regression

问题 When we have a high degree linear polynomial that is used to fit a set of points in a linear regression setup, to prevent overfitting, we use regularization, and we include a lambda parameter in the cost function. This lambda is then used to update the theta parameters in the gradient descent algorithm. My question is how do we calculate this lambda regularization parameter? 回答1: The regularization parameter (lambda) is an input to your model so what you probably want to know is how do you

Screening (multi)collinearity in a regression model

阅读更多关于 Screening (multi)collinearity in a regression model

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect of individual predictors. One way to spot collinearity is to put each predictor as a dependent variable, and other predictors as independent variables, determine R 2 , and if it's larger than .9 (or .95), we can consider predictor redundant. This is one "method"

How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

阅读更多关于 How to do a GLM when “contrasts can be applied only to factors with 2 or more levels”?

I want to do a regression in R using glm , but is there a way to do it since I get the contrasts error. mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12), WL=rep(c(1,0),12), New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA)) mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf) #Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : # contrasts can be applied only to factors with 2 or more levels Using

Fitting linear model / ANOVA by group [duplicate]

阅读更多关于 Fitting linear model / ANOVA by group [duplicate]

This question already has an answer here: Linear Regression and group by in R 10 answers I'm trying to run anova() in R and running into some difficulty. This is what I've done up to now to help shed some light on my question. Here is the str() of my data to this point. str(mhw) 'data.frame': 500 obs. of 5 variables: $ r : int 1 2 3 4 5 6 7 8 9 10 ... $ c : int 1 1 1 1 1 1 1 1 1 1 ... $ grain: num 3.63 4.07 4.51 3.9 3.63 3.16 3.18 3.42 3.97 3.4 ... $ straw: num 6.37 6.24 7.05 6.91 5.93 5.59 5.32 5.52 6.03 5.66 ... $ Quad : Factor w/ 4 levels "NE","NW","SE",..: 2 2 2 2 2 2 2 2 2 2 ... Column r

Highcharts - Get crossing point of crossing series

阅读更多关于 Highcharts - Get crossing point of crossing series

I am currently trying to extract the points of multiple crossings of series (a,b,c,d) of a specific series (x). I can't seem to find any function that can aid me in this task. My best bet is to measure the distance of every single point in x with every single point in a,b,c,d... and assume when the distance reaches under some threshold, the point must be a crossing point. I think this approach is far too computational heavy and seems "dirty". I believe there must be easier or better ways, even perhaps functions within highcharts own API. I have searched various sources and sites, but I can't

ggplot2: Problem with x axis when adding regression line equation on each facet

阅读更多关于 ggplot2: Problem with x axis when adding regression line equation on each facet

Based on the example here Adding Regression Line Equation and R2 on graph , I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis. library(ggplot2) library(reshape2) df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous() + stat_smooth_func(geom = 'text', method = 'lm', hjust = 0, parse = T) + geom_smooth(method = 'lm',