linear-regression

how to debug “factor has new levels” error for linear model and prediction

和自甴很熟 提交于 2019-12-17 13:34:09
问题 I am trying to make and test a linear model as follows: lm_model <- lm(Purchase ~., data = train) lm_prediction <- predict(lm_model, test) This results in the following error, stating that the Product_Category_1 column has values that exist in the test data frame but not the train data frame): factor Product_Category_1 has new levels 7, 9, 14, 16, 17, 18 However, if I check these they definitely look to appear in both data frames: > nrow(subset(train, Product_Category_1 == "7")) [1] 2923 >

Messy plot when plotting predictions of a polynomial regression using lm() in R

北城余情 提交于 2019-12-17 10:06:32
问题 I am building a quadratic model with lm in R: y <- data[[1]] x <- data[[2]] x2 <- x^2 quadratic.model = lm(y ~ x + x2) Now I want to display both the predicted values and the actual values on a plot. I tried this: par(las=1,bty="l") plot(y~x) P <- predict(quadratic.model) lines(x, P) but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help. 回答1: You need order() : P <- predict(quadratic.model) plot(y~x) reorder <- order(x) lines(x[reorder]

Messy plot when plotting predictions of a polynomial regression using lm() in R

陌路散爱 提交于 2019-12-17 10:05:16
问题 I am building a quadratic model with lm in R: y <- data[[1]] x <- data[[2]] x2 <- x^2 quadratic.model = lm(y ~ x + x2) Now I want to display both the predicted values and the actual values on a plot. I tried this: par(las=1,bty="l") plot(y~x) P <- predict(quadratic.model) lines(x, P) but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help. 回答1: You need order() : P <- predict(quadratic.model) plot(y~x) reorder <- order(x) lines(x[reorder]

R Loop for Variable Names to run linear regression model

ぐ巨炮叔叔 提交于 2019-12-17 09:58:43
问题 First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly " 10 " in them in order to run a simple linear regression. So here's my code: indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set col10 <- names(data[indx]) #this gives me the names of the columns I want Here is the for

Multiple linear regression in Python

走远了吗. 提交于 2019-12-17 02:39:07
问题 I can't seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.). For example, with this data: print 'y x1 x2 x3 x4 x5 x6 x7' for t in texts: print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" / .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7) (output for above:) y x1 x2 x3 x4 x5 x6 x7 -6.0 -4.95 -5.87 -0.76 14.73 4

How does Caret generate an OLS model with K-fold cross validation?

本秂侑毒 提交于 2019-12-14 03:59:05
问题 Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates. In caret, this ends up being: k10_cv = trainControl(method="cv", number=10) ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm") From there, I can pull out regression information using summary(ols_model) and can also pull some more

Working with date types in Python Linear regression

喜你入骨 提交于 2019-12-13 20:28:27
问题 Data Set: I have collected tablespace growth of my database and trying to use it to predict the growth. Dataset has data from year 2009 to 2017. I tried many ways but unable to use the date format for processing. Got errors and all of them are related to date time types. Can you please suggest how i can use this dataset to predict the growth. One of the errors: TypeError: Cannot cast array data from dtype('M8[ns]') to dtype('float64') according to the rule 'safe' TS_SIZE FETCH_DATE 34911.99

Dynamic formula creation in R?

末鹿安然 提交于 2019-12-13 19:38:44
问题 Is it at all possible to use the lm() function with a matrix? Or maybe, the correct question is: "Is it possible to dynamically create formulas in R?" I am creating a function whose output is a matrix and the number of columns in the matrix is not fixed = it depends on the inputs of the user. I want to fit an OLS model using the data in the matrix. - The first column represents the dependent variable - The other columns are the independent variables. Using the lm function requires a formula,

predict vector values instead of single output

你。 提交于 2019-12-13 19:23:10
问题 In linear regression I've always seen the situation where I have many features and I use them to predict a single output, for example f1 f2 f3 f4 --> y1 f1 f2 f3 f4 --> y2 and so on... I want to know if there is something where the predicted value i.e. y1 is actually a vector not a single value 回答1: Yes, pretty much every regression method (neural networks, support vector regressors, random forest regressors, ....) works just fine for multidimensional output. Including linear regression. In

PyTorch will not fit straight line to two data points

无人久伴 提交于 2019-12-13 17:26:48
问题 I'm facing issues in fitting a simple y= 4x1 line with 2 data points using pytorch. While running the inference code, the model seems to output same value to any input which is strange. Pls find the code attached along with the data files used by me. Appreciate any help here. import torch import numpy as np import pandas as pd df = pd.read_csv('data.csv') test_data = pd.read_csv('test_data.csv') inputs = df[['x1']] target = df['y'] inputs = torch.tensor(inputs.values).float() target = torch