regression | 易学教程

plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

阅读更多关于 plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

For the simple example below, you can see that there are certain points that are identified in the ensuing plots. How can I extract the row numbers identified in these plots, especially the Normal Q-Q plot? set.seed(2016) maya <- data.frame(rnorm(100)) names(maya)[1] <- "a" maya$b <- rnorm(100) mara <- lm(b~a, data=maya) plot(mara) I tried using str(mara) to see if I could find a list there, but I can't see any of the numbers from the Normal Q-Q plot there. Thoughts? I have edited your question using set.seed(2016) for reproducibility. To answer your question, I need to explain how to produce

Gradient in continuous regression using a neural network

阅读更多关于 Gradient in continuous regression using a neural network

问题 I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one): My nnCostFunction now is: function [J grad] = nnCostFunctionLinear(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) Theta1 = reshape(nn_params(1

Any simple way to get regression prediction intervals in R?

阅读更多关于 Any simple way to get regression prediction intervals in R?

I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command: > confint(W1500.LR1, level = 0.95) 2.5 % 97.5 % (Intercept) 666.2817393 668.0216072 Distance 0.3934499 0.3946572 which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the

add a logarithmic regression line to a scatterplot (comparison with Excel)

阅读更多关于 add a logarithmic regression line to a scatterplot (comparison with Excel)

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . In Excel, its pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this. To generate the graph, I used ggplot2 with the following code. ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3

Ridge-regression model: glmnet

阅读更多关于 Ridge-regression model: glmnet

Fitting a linear-regression model using least squares on my training dataset works fine. library(Matrix) library(tm) library(glmnet) library(e1071) library(SparseM) library(ggplot2) trainingData <- read.csv("train.csv", stringsAsFactors=FALSE,sep=",", header = FALSE) testingData <- read.csv("test.csv",sep=",", stringsAsFactors=FALSE, header = FALSE) lm.fit = lm(as.factor(V42)~ ., data = trainingData) linearMPrediction = predict(lm.fit,newdata = testingData, se.fit = TRUE) mean((linearMPrediction$fit - testingData[,20:41])^2) linearMPrediction$residual.scale However, when i try to fit a ridge

301_PyTorch中文教程：回归分析-Regression

阅读更多关于 301_PyTorch中文教程：回归分析-Regression

301_PyTorch中文教程：回归分析-Regression 更多参考: https://morvanzhou.github.io/tutorials/ 油管频道: https://www.youtube.com/user/MorvanZhou 依赖软件包 torch matplotlib import torch import torch.nn.functional as F import matplotlib.pyplot as plt %matplotlib inline torch.manual_seed(1) # reproducible <torch._C.Generator at 0x7f2c68165e90> x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1) y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1) plt.scatter(x.data.numpy(), y.data.numpy()) plt.show() x[:10] tensor([[-1.0000], [-0.9798], [-0.9596], [-0.9394],

“Rolling” Regression in R

阅读更多关于 “Rolling” Regression in R

Say I want to run regressions per group whereby I want to use the last 5 year data as input for that regression. Then, for each next year, I would like to "shift" the input for that regression by one year (i.e., 4 observations). From those regressions I want to extract both the R2 and the fitted values/residuals, which I then need in subsequent regressions that follow similar notions. I have some code working using loops, but it is not really elegant nor efficient for large datasets. I assume there must be a nice plyr way for resolving this issue. # libraries # library(dplyr) library(broom) #

How does plot.lm() determine outliers for residual vs fitted plot?

阅读更多关于 How does plot.lm() determine outliers for residual vs fitted plot?

问题 How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this: Details sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page. The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of

Python or SQL Logistic Regression

阅读更多关于 Python or SQL Logistic Regression

Given time-series data, I want to find the best fitting logarithmic curve. What are good libraries for doing this in either Python or SQL? Edit: Specifically, what I'm looking for is a library that can fit data resembling a sigmoid function, with upper and lower horizontal asymptotes. If your data were categorical, then you could use a logistic regression to fit the probabilities of belonging to a class (classification). However, I understand you are trying to fit the data to a sigmoid curve, which means you just want to minimize the mean squared error of the fit. I would redirect you to the

for loops for regression over multiple variables & outputting a subset

阅读更多关于 for loops for regression over multiple variables & outputting a subset

I have tried to apply this QA: "efficient looping logistic regression in R" to my own problem but I cannot quite make it work. I haven't tried to use apply, but I was told by a few people that a for loop is the best here (if someone believes otherwise please feel free to explain!) I think this problem is pretty generalizeable and not too esoteric for the forum. This is what I want to achieve: I have a dataset with 3 predictor variables (gender, age, race) and a dependent variable (a proportion) for 86 genetic positions for several people. I want to run bivariate linear regressions for each