regression

plot.lm(): extracting numbers labelled in the diagnostic Q-Q plot

这一生的挚爱 提交于 2019-12-04 23:32:38
For the simple example below, you can see that there are certain points that are identified in the ensuing plots. How can I extract the row numbers identified in these plots, especially the Normal Q-Q plot? set.seed(2016) maya <- data.frame(rnorm(100)) names(maya)[1] <- "a" maya$b <- rnorm(100) mara <- lm(b~a, data=maya) plot(mara) I tried using str(mara) to see if I could find a list there, but I can't see any of the numbers from the Normal Q-Q plot there. Thoughts? I have edited your question using set.seed(2016) for reproducibility. To answer your question, I need to explain how to produce

Gradient in continuous regression using a neural network

独自空忆成欢 提交于 2019-12-04 22:11:22
问题 I'm trying to implement a regression NN that has 3 layers (1 input, 1 hidden and 1 output layer with a continuous result). As a basis I took a classification NN from coursera.org class, but changed the cost function and gradient calculation so as to fit a regression problem (and not a classification one): My nnCostFunction now is: function [J grad] = nnCostFunctionLinear(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda) Theta1 = reshape(nn_params(1

Any simple way to get regression prediction intervals in R?

柔情痞子 提交于 2019-12-04 22:06:18
I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command: > confint(W1500.LR1, level = 0.95) 2.5 % 97.5 % (Intercept) 666.2817393 668.0216072 Distance 0.3934499 0.3946572 which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the

add a logarithmic regression line to a scatterplot (comparison with Excel)

旧时模样 提交于 2019-12-04 21:20:10
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . In Excel, its pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this. To generate the graph, I used ggplot2 with the following code. ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3

Ridge-regression model: glmnet

对着背影说爱祢 提交于 2019-12-04 20:37:46
Fitting a linear-regression model using least squares on my training dataset works fine. library(Matrix) library(tm) library(glmnet) library(e1071) library(SparseM) library(ggplot2) trainingData <- read.csv("train.csv", stringsAsFactors=FALSE,sep=",", header = FALSE) testingData <- read.csv("test.csv",sep=",", stringsAsFactors=FALSE, header = FALSE) lm.fit = lm(as.factor(V42)~ ., data = trainingData) linearMPrediction = predict(lm.fit,newdata = testingData, se.fit = TRUE) mean((linearMPrediction$fit - testingData[,20:41])^2) linearMPrediction$residual.scale However, when i try to fit a ridge

301_PyTorch中文教程:回归分析-Regression

非 Y 不嫁゛ 提交于 2019-12-04 20:34:28
301_PyTorch中文教程:回归分析-Regression 更多参考: https://morvanzhou.github.io/tutorials/ 油管频道: https://www.youtube.com/user/MorvanZhou 依赖软件包 torch matplotlib import torch import torch.nn.functional as F import matplotlib.pyplot as plt %matplotlib inline torch.manual_seed(1) # reproducible <torch._C.Generator at 0x7f2c68165e90> x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1) y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1) plt.scatter(x.data.numpy(), y.data.numpy()) plt.show() x[:10] tensor([[-1.0000], [-0.9798], [-0.9596], [-0.9394],

“Rolling” Regression in R

断了今生、忘了曾经 提交于 2019-12-04 20:00:53
Say I want to run regressions per group whereby I want to use the last 5 year data as input for that regression. Then, for each next year, I would like to "shift" the input for that regression by one year (i.e., 4 observations). From those regressions I want to extract both the R2 and the fitted values/residuals, which I then need in subsequent regressions that follow similar notions. I have some code working using loops, but it is not really elegant nor efficient for large datasets. I assume there must be a nice plyr way for resolving this issue. # libraries # library(dplyr) library(broom) #

How does plot.lm() determine outliers for residual vs fitted plot?

限于喜欢 提交于 2019-12-04 19:50:57
问题 How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this: Details sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page. The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of

Python or SQL Logistic Regression

家住魔仙堡 提交于 2019-12-04 19:44:28
Given time-series data, I want to find the best fitting logarithmic curve. What are good libraries for doing this in either Python or SQL? Edit: Specifically, what I'm looking for is a library that can fit data resembling a sigmoid function, with upper and lower horizontal asymptotes. If your data were categorical, then you could use a logistic regression to fit the probabilities of belonging to a class (classification). However, I understand you are trying to fit the data to a sigmoid curve, which means you just want to minimize the mean squared error of the fit. I would redirect you to the

for loops for regression over multiple variables & outputting a subset

杀马特。学长 韩版系。学妹 提交于 2019-12-04 19:28:34
I have tried to apply this QA: "efficient looping logistic regression in R" to my own problem but I cannot quite make it work. I haven't tried to use apply, but I was told by a few people that a for loop is the best here (if someone believes otherwise please feel free to explain!) I think this problem is pretty generalizeable and not too esoteric for the forum. This is what I want to achieve: I have a dataset with 3 predictor variables (gender, age, race) and a dependent variable (a proportion) for 86 genetic positions for several people. I want to run bivariate linear regressions for each