regression

Showing fitted values with R and dplyr

こ雲淡風輕ζ 提交于 2019-12-01 08:33:58
I have the data frame DF . I am using R and dplyr to analise it. DF contains: > glimpse(DF) Observations: 1244160 Variables: $ Channel (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... $ Row (int) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,... $ Col (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... $ mean (dbl) 776.0667, 786.6000, 833.4667, 752.3333, 831.6667, 772.9333... I fit it with: Fit <- DF %>% group_by(Channel) %>% do(fit = lm(mean ~ Col + poly(Row, 2), data = .)) How can I get another column in DF with the data (given Channel , Row and

Seaborn barplot with regression line

那年仲夏 提交于 2019-12-01 07:32:44
问题 Is there a way to add a regression line to a barplot in seaborn where the x axis contains pandas.Timestamps? For example, overlay a trendline in this bar plot below. Am looking for the most efficient way to do this: seaborn.set(style="white", context="talk") a = pandas.DataFrame.from_dict({'Attendees': {pandas.Timestamp('2016-12-01'): 10, pandas.Timestamp('2017-01-01'): 12, pandas.Timestamp('2017-02-01'): 15, pandas.Timestamp('2017-03-01'): 16, pandas.Timestamp('2017-04-01'): 20}}) ax =

Showing fitted values with R and dplyr

て烟熏妆下的殇ゞ 提交于 2019-12-01 06:25:37
问题 I have the data frame DF . I am using R and dplyr to analise it. DF contains: > glimpse(DF) Observations: 1244160 Variables: $ Channel (int) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... $ Row (int) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,... $ Col (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1... $ mean (dbl) 776.0667, 786.6000, 833.4667, 752.3333, 831.6667, 772.9333... I fit it with: Fit <- DF %>% group_by(Channel) %>% do(fit = lm(mean ~

Why do I get NA coefficients and how does `lm` drop reference level for interaction

感情迁移 提交于 2019-12-01 05:26:40
I am trying to understand how R determines reference groups for interactions in a linear model. Consider the following: df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), year = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), treatment = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,

How to compute standard error from ODR results?

元气小坏坏 提交于 2019-12-01 04:22:13
I use scipy.odr in order to make a fit with uncertainties on both x and y following this question Correct fitting with scipy curve_fit including errors in x? After the fit I would like to compute the uncertainties on the parameters. Thus I look at the square root of the diagonal elements of the covariance matrix. I get : >>> print(np.sqrt(np.diag(output.cov_beta))) [ 0.17516591 0.33020487 0.27856021] But in the Output there is also output.sd_beta which is, according to the doc on odr Standard errors of the estimated parameters, of shape (p,). But, it does not give me the same results : >>>

How to create a graph showing the predictive model, data and residuals in R

蓝咒 提交于 2019-12-01 04:04:20
Given two variables, x and y , I run a dynlm regression on the variables and would like to plot the fitted model against one of the variables and the residual on the bottom showing how the actual data line differs from the predicting line. I've seen it done before and I've done it before, but for the life of me I can't remember how to do it or find anything that explains it. This gets me into the ballpark where I have a model and two variables, but I can't get the type of graph I want. library(dynlm) x <- rnorm(100) y <- rnorm(100) model <- dynlm(x ~ y) plot(x, type="l", col="red") lines(y,

Plot conditional density curve `P(Y|X)` along a linear regression line

南楼画角 提交于 2019-12-01 03:53:34
This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1.295, 1.438, -0.638, 0.716, 1.004, -1.328, -1.759, -1.315, 1.053, 1.958, -2.034, 2.936, -0.078, -0.676, -2

Regression for a Rate variable in R

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-01 03:47:07
问题 I was tasked with developing a regression model looking at student enrollment in different programs. This is a very nice, clean data set where the enrollment counts follow a Poisson distribution well. I fit a model in R (using both GLM and Zero Inflated Poisson.) The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a "rate" which was calculated as students / school_population (Each school has its own population.)) This is now no longer a

Activation function for output layer for regression models in Neural Networks

寵の児 提交于 2019-12-01 03:04:17
I have been experimenting with neural networks these days. I have come across a general question regarding the activation function to use. This might be a well known fact to but I couldn't understand properly. A lot of the examples and papers I have seen are working on classification problems and they either use sigmoid (in binary case) or softmax (in multi-class case) as the activation function in the out put layer and it makes sense. But I haven't seen any activation function used in the output layer of a regression model. So my question is that is it by choice we don't use any activation

Plot conditional density curve `P(Y|X)` along a linear regression line

对着背影说爱祢 提交于 2019-12-01 01:30:55
问题 This is my data frame, with two columns Y (response) and X (covariate): ## Editor edit: use `dat` not `data` dat <- structure(list(Y = c(NA, -1.793, -0.642, 1.189, -0.823, -1.715, 1.623, 0.964, 0.395, -3.736, -0.47, 2.366, 0.634, -0.701, -1.692, 0.155, 2.502, -2.292, 1.967, -2.326, -1.476, 1.464, 1.45, -0.797, 1.27, 2.515, -0.765, 0.261, 0.423, 1.698, -2.734, 0.743, -2.39, 0.365, 2.981, -1.185, -0.57, 2.638, -1.046, 1.931, 4.583, -1.276, 1.075, 2.893, -1.602, 1.801, 2.405, -5.236, 2.214, 1