regression | 易学教程

Predicted values of each fold in K-Fold Cross Validation in sklearn

阅读更多关于 Predicted values of each fold in K-Fold Cross Validation in sklearn

问题 I have performed 10-fold cross validation on a dataset that I have using python sklearn, result = cross_val_score(best_svr, X, y, cv=10, scoring='r2') print(result.mean()) I have been able to get the mean value of the r2 score as the final result. I want to know if there is a way to print out the predicted values for each fold( in this case 10 sets of values). 回答1: I believe you are looking for the cross_val_predict function. 回答2: To print the predictions for each fold, for k in range(2,10):

Calculate residual deviance from scikit-learn logistic regression model

阅读更多关于 Calculate residual deviance from scikit-learn logistic regression model

问题 Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldn't find it any of sklearn's documentation. 回答1: Actually, you can. Deviance is closely related to cross entropy, which is in sklearn.metrics.log_loss . Deviance is just 2*(loglikelihood_of_saturated_model - loglikelihood_of_fitted_model). Scikit learn can (without larger tweaks) only handle classification of individual instances, so that

Warning message 'newdata' had 1 row but variables found have 16 rows in R

阅读更多关于 Warning message 'newdata' had 1 row but variables found have 16 rows in R

问题 I am suppose to use the predict function to predict when fjbjor is 5.5 and I always get this warning message and I have tried many ways but it always comes so is there anyone who can see what I am doing wrong here This is my code fit.lm <- lm(fjbjor~amagn, data=bjor) summary(fit.lm) new.bjor<- data.frame(fjbjor=5.5) predict(fit.lm,new.bjor) and this comes out 1 2 3 4 5 6 7 8 9 10 11 5.981287 2.864521 9.988559 5.758661 4.645530 2.419269 4.645530 5.313409 6.871792 3.309773 4.200278 12 13 14 15

Conditional formatting of panel background in GGplot2

阅读更多关于 Conditional formatting of panel background in GGplot2

问题 I was wondering whether there is a "direct" manner to link the slope of a regression line in a ggplot facet panel to the background colour of that panel (i.e. to visually seperate positive slopes from negative slopes in a large grid). I understand how to add a regression line in GGplots - as was well explained on Adding a regression line to a facet_grid with qplot in R I also understand how to change the background if you have previously added this information to the original dataframe - as

ggplot2: how to get robust confidence interval for predictions in geom_smooth?

阅读更多关于 ggplot2: how to get robust confidence interval for predictions in geom_smooth?

问题 consider this simple example dataframe <- data_frame(x = c(1,2,3,4,5,6), y = c(12,24,24,34,12,15)) > dataframe # A tibble: 6 x 2 x y <dbl> <dbl> 1 1 12 2 2 24 3 3 24 4 4 34 5 5 12 6 6 15 dataframe %>% ggplot(., aes(x = x, y = y)) + geom_point() + geom_smooth(method = 'lm', formula = y~x) Here the standard errors are computed with the default option. However, I would like to use the robust variance-covariance matrix available in the package sandwich and lmtest That is, using vcovHC(mymodel,

How to use a list of model names and variables for computing a table with its predictions?

阅读更多关于 How to use a list of model names and variables for computing a table with its predictions?

问题 I am currently doing regression analysis on a dataset of mine, and thought that in order to compare different regression models, I could use a table. I would like the table to have the names of the model in the first column, and the predicted values on 1 test point on in the second column. What I have done now is systematically named these models as follows: library(caret) model.lm <- train(formula, data=train, method='lm',...) model.glmnet<- train(formula, data=train, method='glmnet',...) ..

C++ ARMA method and regression analysis

阅读更多关于 C++ ARMA method and regression analysis

问题 Is there any C++ library that implements the ARMA method and possibly its variants? I'd be good to have a mature distribution for this kind of analysis. 回答1: I am not aware of any native C++ library to compute ARMA models. However, if convenience is more important to you than raw performance, you can do it indirectly: Use R to compute ARMA models Use RCCP to link C++ to R (or vice versa) 来源： https://stackoverflow.com/questions/11272856/c-arma-method-and-regression-analysis

XGBoost Best Iteration

阅读更多关于 XGBoost Best Iteration

问题 I am running a regression using the XGBoost Algorithm as, clf = XGBRegressor(eval_set = [(X_train, y_train), (X_val, y_val)], early_stopping_rounds = 10, n_estimators = 10, verbose = 50) clf.fit(X_train, y_train, verbose=False) print("Best Iteration: {}".format(clf.booster().best_iteration)) It correctly trains itself, but the print function raises the following error, TypeError: 'str' object is not callable How can I get the number of the best iteration of the model? Furthermore, how can I

Does R always return NA as a coefficient as a result of linear regression with unnecessary variables?

阅读更多关于 Does R always return NA as a coefficient as a result of linear regression with unnecessary variables?

问题 My question is about the unnecessary predictors, namely the variables that do not provide any new linear information or the variables that are linear combinations of the other predictors. As you can see the swiss dataset has six variables. library(swiss) names(swiss) # "Fertility" "Agriculture" "Examination" "Education" # "Catholic" "Infant.Mortality" Now I introduce a new variable ec . It is the linear combination of Examination and Education . ec <- swiss$Examination + swiss$Catholic When

Solve best fit polynomial and plot drop-down lines

阅读更多关于 Solve best fit polynomial and plot drop-down lines

问题 I'm using R 3.3.1 (64-bit) on Windows 10. I have an x-y dataset that I've fit with a 2nd order polynomial. I'd like to solve that best-fit polynomial for x at y=4, and plot drop-down lines from y=4 to the x-axis. This will generate the data in a dataframe v1: v1 <- structure(list(x = c(-5.2549, -3.4893, -3.5909, -2.5546, -3.7247, -5.1733, -3.3451, -2.8993, -2.6835, -3.9495, -4.9649, -2.8438, -4.6926, -3.4768, -3.1221, -4.8175, -4.5641, -3.549, -3.08, -2.4153, -2.9882, -3.4045, -4.6394, -3