regression

ValueError: feature_names mismatch: in xgboost in the predict() function

杀马特。学长 韩版系。学妹 提交于 2019-12-18 13:03:35
问题 I have trained an XGBoostRegressor model. When I have to use this trained model for predicting for a new input, the predict() function throws a feature_names mismatch error, although the input feature vector has the same structure as the training data. Also, in order to build the feature vector in the same structure as the training data, I am doing a lot inefficient processing such as adding new empty columns (if data does not exist) and then rearranging the data columns so that it matches

Scaling of target causes Scikit-learn SVM regression to break down

无人久伴 提交于 2019-12-18 12:30:04
问题 When training a SVM regression it is usually advisable to scale the input features before training. But how about scaling of the targets? Usually this is not considered necessary, and I do not see a good reason why it should be necessary. However in the scikit-learn example for SVM regression from: http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html By just introducing the line y=y/1000 before training, the prediction will break down to a constant value. Scaling the

Multiple regression analysis in R using QR decomposition

我们两清 提交于 2019-12-18 12:08:48
问题 I am trying to write a function for solving multiple regression using QR decomposition. Input: y vector and X matrix; output: b, e, R^2. So far I`ve got this and am terribly stuck; I think I have made everything way too complicated: QR.regression <- function(y, X) { X <- as.matrix(X) y <- as.vector(y) p <- as.integer(ncol(X)) if (is.na(p)) stop("ncol(X) is invalid") n <- as.integer(nrow(X)) if (is.na(n)) stop("nrow(X) is invalid") nr <- length(y) nc <- NCOL(X) # Householder for (j in seq_len

Applying a rolling window regression to an XTS series in R

限于喜欢 提交于 2019-12-18 11:56:12
问题 I have an xts of 1033 daily returns points for 5 currency pairs on which I want to run a rolling window regression, but rollapply is not working for my defined function which uses lm(). Here is my data: > head(fxr) USDZAR USDEUR USDGBP USDCHF USDCAD 2007-10-18 -0.005028709 -0.0064079963 -0.003878743 -0.0099537170 -0.0006153215 2007-10-19 -0.001544470 0.0014275520 -0.001842564 0.0023058211 -0.0111410271 2007-10-22 0.010878027 0.0086642116 0.010599365 0.0051899551 0.0173792230 2007-10-23 -0

Understanding Tensorflow LSTM Input shape

做~自己de王妃 提交于 2019-12-18 10:42:08
问题 I have a dataset X which consists N = 4000 samples , each sample consists of d = 2 features (continuous values) spanning back t = 10 time steps . I also have the corresponding 'labels' of each sample which are also continuous values, at time step 11. At the moment my dataset is in the shape X: [4000,20], Y: [4000]. I want to train an LSTM using TensorFlow to predict the value of Y (regression), given the 10 previous inputs of d features, but I am having a tough time implementing this in

GridSearchCV - XGBoost - Early Stopping

我的梦境 提交于 2019-12-18 10:37:14
问题 i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API. model = xgb.XGBRegressor() GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid)

How to update `lm` or `glm` model on same subset of data?

筅森魡賤 提交于 2019-12-18 09:17:27
问题 I am trying to fit two nested models and then test those against each other using anova function. The commands used are: probit <- glm(grad ~ afqt1 + fhgc + mhgc + hisp + black + male, data=dt, family=binomial(link = "probit")) nprobit <- update(probit, . ~ . - afqt1) anova(nprobit, probit, test="Rao") However, the variable afqt1 apparently contains NA s and because the update call does not take the same subset of data, anova() returns error Error in anova.glmlist(c(list(object), dotargs),

Repeat regression with varying dependent variable

感情迁移 提交于 2019-12-18 09:15:07
问题 I've searched both Stack and google for a solution, none found to solve my problem. I have about 40 dependent variables, for which I aim to obtain adjusted means (lsmeans). I need adjusted means for group A and group B, after accounting for some covariates. My final object should be a data frame with predicted means for all 40 dependent variables for group A and group B. This is what I tried, without any success: # Examplified here with 2 outcome variables outcome1 <- c(2, 4, 6, 8, 10, 12, 14

Getting the y-axis intercept and slope from a linear regression of multiple data and passing the intercept and slope values to a data frame

非 Y 不嫁゛ 提交于 2019-12-18 08:49:34
问题 I have a data frame x1 , which was generated with the following piece of code, x <- c(1:10) y <- x^3 z <- y-20 s <- z/3 t <- s*6 q <- s*y x1 <- cbind(x,y,z,s,t,q) x1 <- data.frame(x1) I would like to extract the y-axis intercept and the slope of the linear regression fit for the data, x y z s t q 1 1 1 -19 -6.333333 -38 -6.333333 2 2 8 -12 -4.000000 -24 -32.000000 3 3 27 7 2.333333 14 63.000000 4 4 64 44 14.666667 88 938.666667 5 5 125 105 35.000000 210 4375.000000 6 6 216 196 65.333333 392

Error in dataframe *tmp* replacement has x data has y

放肆的年华 提交于 2019-12-18 07:44:12
问题 I'm a beginner in R. Here is a very simple code where I'm trying to save the residual term: # Create variables for child's EA: dat$cldeacdi <- rowMeans(dat[,c('cdcresp', 'cdcinv')],na.rm=T) dat$cldeacu <- rowMeans(dat[,c('cucresp', 'cucinv')],na.rm=T) # Create a residual score for child EA: dat$cldearesid <- resid(lm(cldeacu ~ cldeacdi, data = dat)) I'm getting the following message: Error in `$<-.data.frame`(`*tmp*`, cldearesid, value = c(-0.18608488908881, : replacement has 366 rows, data