linear-regression

Is there an equivalent function for anova.lm() in Java?

China☆狼群 提交于 2019-12-18 09:15:08
问题 I am comparing two linear models in R with Anova, and I would like to do the same thing in Java. To simplify it, I took the example code from https://stats.stackexchange.com/questions/48854/why-am-i-getting-different-intercept-values-in-r-and-java-for-simple-linear-regr and modified it a bit below. The models are test_trait ~ geno_A + geno_B and test_trait ~ geno_A + geno_B + geno_A:geno_B . The coefficients of the models implemented in R and Java are the same. In R I use anova(fit, fit2)

Getting the y-axis intercept and slope from a linear regression of multiple data and passing the intercept and slope values to a data frame

非 Y 不嫁゛ 提交于 2019-12-18 08:49:34
问题 I have a data frame x1 , which was generated with the following piece of code, x <- c(1:10) y <- x^3 z <- y-20 s <- z/3 t <- s*6 q <- s*y x1 <- cbind(x,y,z,s,t,q) x1 <- data.frame(x1) I would like to extract the y-axis intercept and the slope of the linear regression fit for the data, x y z s t q 1 1 1 -19 -6.333333 -38 -6.333333 2 2 8 -12 -4.000000 -24 -32.000000 3 3 27 7 2.333333 14 63.000000 4 4 64 44 14.666667 88 938.666667 5 5 125 105 35.000000 210 4375.000000 6 6 216 196 65.333333 392

How to Loop/Repeat a Linear Regression in R

纵饮孤独 提交于 2019-12-17 22:35:50
问题 I have figured out how to make a table in R with 4 variables, which I am using for multiple linear regressions. The dependent variable (Lung) for each regression is taken from one column of a csv table of 22,000 columns. One of the independent variables (Blood) is taken from a corresponding column of a similar table. Each column represents the levels of a particular gene, which is why there are so many of them. There are also two additional variables (Age and Gender of each patient). When I

predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

我是研究僧i 提交于 2019-12-17 22:18:45
问题 This R code throws a warning # Fit regression model to each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k for (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+"))) y[[i]] <- lm(f[[i]], data=C1[[i]]) #training set C1[[i]] <- cbind(C1[[i]], fitted(y[[i]])) C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set } I have a training data set (C1) and a test data set

Can scipy.stats identify and mask obvious outliers?

你说的曾经没有我的故事 提交于 2019-12-17 21:52:38
问题 With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. More generally (i.e. programmatically) is there a way to identify and mask outliers? 回答1: The statsmodels package has what you need. Look at this little code snippet and its output: # Imports # import statsmodels.api as smapi import statsmodels.graphics as smgraphics # Make data # x = range(30) y =

Adding statsmodels 'predict' results to a Pandas dataframe

大憨熊 提交于 2019-12-17 20:28:13
问题 It is common to want to append the results of predictions to the dataset used to make the predictions, but the statsmodels predict function returns (non-indexed) results of a potentially different length than the dataset on which predictions are based. For example, if the test dataset, test , contains any null entries, then mod_fit = sm.Logit.from_formula('Y ~ A B C', train).fit() press = mod_fit.predict(test) will produce an array that is shorter than the length of test , and cannot be

Linear regression analysis with string/categorical features (variables)?

爱⌒轻易说出口 提交于 2019-12-17 17:26:37
问题 Regression algorithms seem to be working on features represented as numbers. For example: This dataset doesn't contain categorical features/variables. It's quite clear how to do regression on this data and predict price. But now I want to do regression analysis on data that contain categorical features: There are 5 features: District , Condition , Material , Security , Type How can I do regression on this data? Do I have to transform all this string/categorical data to numbers manually? I

How do I determine the coefficients for a linear regression line in MATLAB? [closed]

早过忘川 提交于 2019-12-17 13:50:53
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 months ago . I'm going to write a program where the input is a data set of 2D points and the output is the regression coefficients of the line of best fit by minimizing the minimum MSE error. I have some sample points that I would like to process: X Y 1.00 1.00 2.00 2.00 3.00 1.30 4.00 3.75 5

How do I determine the coefficients for a linear regression line in MATLAB? [closed]

空扰寡人 提交于 2019-12-17 13:50:12
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 months ago . I'm going to write a program where the input is a data set of 2D points and the output is the regression coefficients of the line of best fit by minimizing the minimum MSE error. I have some sample points that I would like to process: X Y 1.00 1.00 2.00 2.00 3.00 1.30 4.00 3.75 5

how to debug “factor has new levels” error for linear model and prediction

若如初见. 提交于 2019-12-17 13:36:19
问题 I am trying to make and test a linear model as follows: lm_model <- lm(Purchase ~., data = train) lm_prediction <- predict(lm_model, test) This results in the following error, stating that the Product_Category_1 column has values that exist in the test data frame but not the train data frame): factor Product_Category_1 has new levels 7, 9, 14, 16, 17, 18 However, if I check these they definitely look to appear in both data frames: > nrow(subset(train, Product_Category_1 == "7")) [1] 2923 >