regression | 易学教程

Percentiles from VGAM

阅读更多关于 Percentiles from VGAM

问题 I am using following example from help pages of package VGAM library(VGAM) fit4 <- vgam(BMI ~ s(age, df = c(4, 2)), lms.bcn(zero = 1), data = bmi.nz, trace = TRUE) qtplot(fit4, percentiles = c(5,50,90,99), main = "Quantiles", las = 1, xlim = c(15, 90), ylab = "BMI", lwd = 2, lcol = 4) I am getting a proper graph with it: How can I avoid plotting points from the graph? Also I need to print out values for these percentiles at each of ages 20,30,40...80 (separately as a table). How can this be

Estimating linear regression with Gradient Descent (Steepest Descent)

阅读更多关于 Estimating linear regression with Gradient Descent (Steepest Descent)

问题 Example data X<-matrix(c(rep(1,97),runif(97)) , nrow=97, ncol=2) y<-matrix(runif(97), nrow= 97 , ncol =1) I have succeed in creating the cost function COST<-function(theta,X,y){ ### Calculate half MSE sum((X %*% theta - y)^2)/(2*length(y)) } How ever when I run this function , it seem to fail to converge over 100 iterations. theta <- matrix (0, nrow=2,ncol=1) num.iters <- 1500 delta = 0 GD<-function(X,y,theta,alpha,num.iters){ for (i in num.iters){ while (max(abs(delta)) < tolerance){ error <

Is there a way to use recursive feature selection with non linear models with scikit-learn?

阅读更多关于 Is there a way to use recursive feature selection with non linear models with scikit-learn?

问题 I am trying to use SVR with an rbf kernel (obviously) on a regression problem. My dataset has something like 300 features. I would like to select more relevant features and use something like the sequentialfs function of matlab which would try every combination (or anyway starting with few variables and adding variables on the way, or the opposite, going backward, like the RFE or RFECV of scikit)). Now, as said, for python there is the RFE but I can't use it with a non linear estimator. Is

How to train network with 2D output? (python,Keras)

阅读更多关于 How to train network with 2D output? (python,Keras)

问题 I want to train a regression network which its outputs are two coordinates (x1,y1) and (x2,y2). my question is: if I want to train network should my output be separated? I mean should my output like this: [x1,y1,x2,y2] or is there a way to stack them like: [(x1,y1),(x2,y2)] Thanks in advance 回答1: The RepeatVector is there for this purpose (see Keras documentation). You want your output shape to be (2, 2) , or an array of two coordinates with two entries each. num_outputs = 2 num_elements = 2

How to choose Gaussian basis functions hyperparameters for linear regression?

阅读更多关于 How to choose Gaussian basis functions hyperparameters for linear regression?

问题 I'm quite new in machine learning environment, and I'm trying to understand properly some basis concept. My problem is the following: I have a set of data observation and the corresponding target values { x , t }. I'm trying to train a function with this data in order to predict the value of unobserved data and I'm trying to achieve this by using the maximum posterior (MAP) technique (and so Bayesian approach) with Gaussian basis function of the form: \{Phi}Gaussian_{j}(x)=exp((x−μ_{j})^2/2

What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

阅读更多关于 What is the red solid line in the “residuals vs leverage” plot produced by `plot.lm()`?

问题 fit <- lm(dist ~ speed, cars) plot(fit, which = 5) What does the solid red line in the middle of plot mean? I think it is not about cook's distance. 回答1: It is the LOESS regression line (with span = 2/3 and degree = 2 ), by smoothing standardised residuals against leverage. Internally in plot.lm() , variable xx is leverage, while rsp is Pearson residuals (i.e., standardised residuals). Then, the scattered plot as well as the red solid line is drawn via: graphics::panel.smooth(xx, rsp) Here is

AIC with weighted nonlinear regression (nls)

阅读更多关于 AIC with weighted nonlinear regression (nls)

问题 I encounter some discrepancies when comparing the deviance of a weighted and unweigthed model with the AIC values. A general example (from ‘nls’): DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) This is the unweighted fit, in the code of ‘nls’ one can see that ‘nls’ generates a vector wts <- rep(1, n) . Now for a weighted fit: fm2DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1, weights = rep(1:8, each = 2)) in

Display regression slopes for multiple subsets in ggplot2 (facet_grid)

阅读更多关于 Display regression slopes for multiple subsets in ggplot2 (facet_grid)

问题 To my fellow programmers, I have been searching all over the web for an answer to this question and I am completely stumped. Quite simply, I am trying to display a slope (y=mx+b) and an R-squared value on my ggplot2 figure (using RStudio). In my experiment, I measure the response of a bacteria to different media compositions (food sources). Therefore, in one figure, I have many panels (or subsets) and each have a different R^2 and slope. In this example, I have 6 different subsets, all of

Reading csv to array, performing linear regression on array and writing to csv in Python depending on gradient

阅读更多关于 Reading csv to array, performing linear regression on array and writing to csv in Python depending on gradient

I am having to tackle a problem that far exceeds my current programming skill for Python. I am having difficulty combining different modules (csv reader, numpy etc.) into a single script. My data contains a large list of weather variables across time (with minute resolution) for many days. My objective is to determine the trend of the wind speed between 9am and 12pm of every day in the list. If the gradient of the wind speed is positive, I wish to write the date on which this occurred to a new csv file, along with what the wind direction was. The data extends for thousands of rows and looks

Specify regression line intercept (R & ggplot2)

阅读更多关于 Specify regression line intercept (R & ggplot2)

问题 BACKGROUND My current plot looks like this: PROBLEM I want to force the regression line to start at 1 for station_1. CODE library(ggplot2) #READ IN DATA var_x = c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011) var_y = c(1.000000,1.041355,1.053106,1.085738,1.126375,1.149899,1.210831,1.249480,1.286305,1.367923,1.486978,1.000000,0.9849343,0.9826141,0.9676000,0.9382975,0.9037476,0.8757748,0.8607960,0.8573634,0.8536138,0.8258877) var