linear-regression | 易学教程

Are there any Linear Regression Function in SQL Server?

阅读更多关于 Are there any Linear Regression Function in SQL Server?

Are there any Linear Regression Function in SQL Server 2005/2008, similar to the the Linear Regression functions in Oracle ? To the best of my knowledge, there is none. Writing one is pretty straightforward, though. The following gives you the constant alpha and slope beta for y = Alpha + Beta * x + epsilon: -- test data (GroupIDs 1, 2 normal regressions, 3, 4 = no variance) WITH some_table(GroupID, x, y) AS ( SELECT 1, 1, 1 UNION SELECT 1, 2, 2 UNION SELECT 1, 3, 1.3 UNION SELECT 1, 4, 3.75 UNION SELECT 1, 5, 2.25 UNION SELECT 2, 95, 85 UNION SELECT 2, 85, 95 UNION SELECT 2, 80, 70 UNION

What is the difference between linear regression and logistic regression?

阅读更多关于 What is the difference between linear regression and logistic regression?

When we have to predict the value of a categorical (or discrete) outcome we use logistic regression . I believe we use linear regression to also predict the value of an outcome given the input values. Then, what is the difference between the two methodologies? Linear regression output as probabilities It's tempting to use the linear regression output as probabilities but it's a mistake because the output can be negative, and greater than 1 whereas probability can not. As regression might actually produce probabilities that could be less than 0, or even bigger than 1, logistic regression was

Fitting linear model / ANOVA by group [duplicate]

阅读更多关于 Fitting linear model / ANOVA by group [duplicate]

This question already has an answer here: Linear Regression and group by in R 10 answers I'm trying to run anova() in R and running into some difficulty. This is what I've done up to now to help shed some light on my question. Here is the str() of my data to this point. str(mhw) 'data.frame': 500 obs. of 5 variables: $ r : int 1 2 3 4 5 6 7 8 9 10 ... $ c : int 1 1 1 1 1 1 1 1 1 1 ... $ grain: num 3.63 4.07 4.51 3.9 3.63 3.16 3.18 3.42 3.97 3.4 ... $ straw: num 6.37 6.24 7.05 6.91 5.93 5.59 5.32 5.52 6.03 5.66 ... $ Quad : Factor w/ 4 levels "NE","NW","SE",..: 2 2 2 2 2 2 2 2 2 2 ... Column r

Adding statsmodels 'predict' results to a Pandas dataframe

阅读更多关于 Adding statsmodels 'predict' results to a Pandas dataframe

It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict function returns (non-indexed) results of a potentially different length than the dataset on which predictions are based. For example, if the test dataset, test , contains any null entries, then mod_fit = sm.Logit.from_formula('Y ~ A B C', train).fit() press = mod_fit.predict(test) will produce an array that is shorter than the length of test , and cannot be usefully appended with test['preds'] = preds And since the result of predict is not indexed, there is no way

Running several linear regressions from a single dataframe in R

阅读更多关于 Running several linear regressions from a single dataframe in R

问题 I have a dataset of export trade data for a single country with 21 columns. The first column indicates the years (1962-2014) while the other 20 are trading partners. I am trying to run linear regressions for the years column and each other column. I have tried the method recommended here: Running multiple, simple linear regressions from dataframe in R that entails using combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE) However this only yields the intercept for each pair which

pyspark Linear Regression Example from official documentation - Bad results?

阅读更多关于 pyspark Linear Regression Example from official documentation - Bad results?

问题 I am planning to use Linear Regression in Spark. To get started, I checked out the example from the official documentation (which you can find here) I also found this question on stackoverflow, which is essentially the same question as mine. The answer suggest to tweak the step size, which I also tried to do, however the results are still as random as without tweaking the step size. The code I'm using looks like this: from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD,

Linear regression with interaction fails in the rms-package

阅读更多关于 Linear regression with interaction fails in the rms-package

问题 I'm playing around with interaction in the formula. I wondered if it's possible to do a regression with interaction for one of the two dummy variables. This seems to work in regular linear regression using the lm() function but with the ols() function in the rms package the same formula fails. Anyone know why? Here's my example data(mtcars) mtcars$gear <- factor(mtcars$gear) regular_lm <- lm(mpg ~ wt + cyl + gear + cyl:gear, data=mtcars) summary(regular_lm) regular_lm <- lm(mpg ~ wt + cyl +

Why does lm run out of memory while matrix multiplication works fine for coefficients?

阅读更多关于 Why does lm run out of memory while matrix multiplication works fine for coefficients?

I am trying to do fixed effects linear regression with R. My data looks like dte yr id v1 v2 . . . . . . . . . . . . . . . I then decided to simply do this by making yr a factor and use lm : lm(v1 ~ factor(yr) + v2 - 1, data = df) However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process. I then decided to try things the old fashioned way: create dummy variables for each of my years t1 to t20 by doing: df$t1 <- 1*(df$yr==1) df$t2 <- 1*(df$yr==2) df$t3 <- 1

Analysis using linear regression based on subgroups

阅读更多关于 Analysis using linear regression based on subgroups

问题 Assume I have data (t,y) , where I expect a linear dependency y(t) . Furthermore, there exist attributes to each observation par1, par2, par3 . Is there an algorithm or technique to decide, if (one or both or all of the parameters) are relevant for the fit or not? I tried leaps::regsubsets(y ~ t + par1 + par2 + par3, data = mydata, nbest = 10) but was not able to get the formula for the best fit. The final result should look like this if plotted. For data see below. Thus, I want the

Linear Regression and storing results in data frame [duplicate]

阅读更多关于 Linear Regression and storing results in data frame [duplicate]

This question already has an answer here: Linear Regression and group by in R 10 answers I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible. Here's a sample of what I'm trying to do: a<- c("a","a","a","a","a", "b","b","b","b","b", "c","c","c","c","c") b<- c(0.1,0.2,0.3,0.2,0.3, 0.1,0.2,0.3,0.2,0.3, 0.1,0.2,0.3,0.2,0.3) c<- c(0.2,0.1,0.3,0.2,0.4, 0.2,0.5