regression

plot linear regressions lines without interaction in ggplot2

点点圈 提交于 2019-11-29 10:51:32
This code plots regression lines with interactions in ggplot2: library(ggplot2) ggplot(mtcars, aes(hp, mpg, group = cyl)) + geom_point() + stat_smooth(method = "lm") Can lines without interactions be plotted with stat_smooth ? Workaround would be to make model outside the ggplot() . Then make predicition for this model and add result to the original data frame. This will add columns fit , lwr and upr . mod<-lm(mpg~factor(cyl)+hp,data=mtcars) mtcars<-cbind(mtcars,predict(mod,interval="confidence")) Now you can use geom_line() with fit values as y to add three regression lines and geom_ribbon()

Java 8 change in UTF-8 decoding

你。 提交于 2019-11-29 10:35:28
We recently migrated our application to JDK 8 from JDK 7. After the change, we ran into a problem with the following snippet of code. String output = new String(byteArray, "UTF-8"); The byte array may contain invalid UTF-8 byte sequences. The same byte array upon UTF-8 decoding, results in two difference strings on Java 7 and Java 8. According to the answer to this SO post , Java 8 "fixes" an error in Java 7 and replaces invalid UTF-8 byte sequences with a replacement string, which is in accordance with the UTF-8 specification. But we would like to stick with Java 7's version of the decoded

Regression line for the entire dataset together with regression lines based on groups in R ggplot2 ?

走远了吗. 提交于 2019-11-29 07:54:30
I am new to ggplot2 and have problem displaying the regression line for the entire data-set together with the regression lines for groups. So far i can plot regression line based on the group but I have no success in getting the regression line for the entire data-set on the same plot. I want all the regression lines with different line style so that they can be easily identified in black and white print. Any help would be highly appreciated. here is my code so far: ggplot(alldata,aes(y = y, x = x, colour= group, shape= group )) + geom_point(size = 3, alpha = .8) + geom_smooth(method="lm",

multiple ggplot linear regression lines

半世苍凉 提交于 2019-11-29 07:41:56
I am plotting the occurrence of a species according to numerous variables on the same plot. There are many other variables but I've only kept the important ones for the sake of this post: > str(GH) 'data.frame': 288 obs. of 21 variables: $ Ee : int 2 2 1 7 6 3 0 9 3 7 ... $ height : num 14 25.5 25 21.5 18.5 36 18 31.5 28.5 19 ... $ legumes : num 0 0 55 30 0 0 55 10 30 0 ... $ grass : num 60 50 30 35 40 35 40 40 35 30 ... $ forbs : num 40 70 40 50 65 70 40 65 70 70 ... I've managed to plot this fine and get it looking nice using (where Ee is the species in question): ggplot(data=GH,aes(y=y,x=x)

Difference between Linear Regression Coefficients between Python and R

馋奶兔 提交于 2019-11-29 07:34:22
问题 I'm trying to run a linear regression in Python that I have already done in R in order to find variables with 0 coefficients. The issue I'm running into is that the linear regression in R returns NAs for columns with low variance while the scikit learn regression returns the coefficients. In the R code, I find and save these variables by saving the variables with NAs as output from the linear regression, but I can't seem to figure out a way to mimic this behavior in python. The code I'm using

How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

那年仲夏 提交于 2019-11-29 07:03:44
I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS / readRDS , but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput output in general is not behaving as I'd expect. Below are examples of making a simple fit, and successful and unsuccessful recreations of the model: dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4)) fit <- lm(z ~ x, dat_train) rm(dat_train) # Just to make sure fit is not dependent upon `dat_train

Adding Regression Line Equation and R2 on SEPARATE LINES graph

北战南征 提交于 2019-11-29 04:36:41
A few years ago, a poster asked how to add regression line equation and R2 on ggplot graphs at the link below. Adding Regression Line Equation and R2 on graph The top solution was this: lm_eqn <- function(df){ m <- lm(y ~ x, df); eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, list(a = format(coef(m)[1], digits = 2), b = format(coef(m)[2], digits = 2), r2 = format(summary(m)$r.squared, digits = 3))) as.character(as.expression(eq)); } p1 <- p + geom_text(x = 25, y = 300, label = lm_eqn(df), parse = TRUE) I am using this code and it works great. However, I was

Search for corresponding node in a regression tree using rpart

谁说我不能喝 提交于 2019-11-29 04:31:40
I'm pretty new to R and I'm stuck with a pretty dumb problem. I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting. Thanks to R the calibration part is easy to do and easy to control. #the package rpart is needed library(rpart) # Loading of a big data file used for calibration my_data <- read.csv("my_file.csv", sep=",", header=TRUE) # Regression tree calibration tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + Attribute4 + Attribute5, method="anova", data=my_data, control=rpart.control(minsplit=100, cp=0.0001)) After

如何优雅的设计一个告警系统?远没有你想的那么简单

左心房为你撑大大i 提交于 2019-11-29 04:19:59
告警的本质 告警对象 监控的指标和策略 理论与现实 异常检测 基于曲线的平滑性检测 基于绝对值的时间周期性 基于振幅的时间周期性 基于曲线回升的异常判断 核心要点总结 告警的本质 没有多少系统的告警是设计得当的。良好的告警设计是一项非常困难的工作。 如何知道你收到的告警是糟糕的?多少次你收到了告警之后,立即就关掉了的?是不是成天被这些没有什么卵用的东西给淹没? 最常见的告警设置:cpu使用率超过90%,然后告警。这种设置在大部分场合下是没有办法提供高质量的告警的。 高质量的告警应该是这样的:每次收到之后你可以立即评估影响的范围,并且每一个告警需要你做出分级响应。所谓每个告警都应该是,actionable的。 告警的实质可以用下图表明: 服务器的设计应该是以这样的无人值守为目的的。假设所有的运维全部放假了,服务也能7*24自动运转。 告警的实质就是“ 把人当服务用 ”。在一些事情还没有办法做到程序化执行的时候,用告警通知人的方式去干预系统达到修正的目的。 一次告警就像一次服务调用一样。如果告警了,但是收到告警的人并不需要做任何处理,那么这就是一种DDoS攻击,攻击的是运维的幸福生活。 很多时候,告警通知人去干的事情是真的可以被自动化掉的。比如服务器挂了,换一台上来。 在小一点的系统里,可能就是停机一会,人工来处理换一台冷备的机器上去。 大一点的系统,因为服务器多了,天天都挂可不行

How to put a complicated equation into a R formula?

旧城冷巷雨未停 提交于 2019-11-29 04:03:48
We have the diameter of trees as the predictor and tree height as the dependent variable. A number of different equations exist for this kind of data and we try to model some of them and compare the results. However, we we can't figure out how to correctly put one equation into the corresponding R formula format. The trees data set in R can be used as an example. data(trees) df <- trees df$h <- df$Height * 0.3048 #transform to metric system df$dbh <- (trees$Girth * 0.3048) / pi #transform tree girth to diameter First, the example of an equation that seems to work well: form1 <- h ~ I(dbh ^ -1)