In ggplot restrict y to be >0 in LOESS

拥有回忆 提交于 2020-03-17 05:55:12

问题


Here's my code:

#data
sites <- 
  structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L, 928L, 928L,
                          928L, 928L, 928L, 928L, 928L), 
                 date = c(13493L, 13534L, 13566L, 13611L, 13723L,
                          13752L, 13804L, 13837L, 13927L, 14028L,
                          14082L, 14122L, 14150L, 14182L, 14199L,
                          16198L, 16279L, 16607L, 16945L, 17545L,
                          17650L, 17743L, 17868L, 17941L, 18017L, 18092L),
                 y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 
                       11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L,
                       6L, 1L, 0L, 5L, 0L)), 
            .Names = c("site", "date", "y"),
            class = "data.frame", row.names = c(NA, -26L))

#convert to date
x<-as.Date(sites$date, origin="1960-01-01") 

#plot smooth, line goes below zero!
qplot(data=sites, x, y, main="Site 349") 
(p <- qplot(data = sites, x, y, xlab = "", ylab = ""))
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5))

Some of the LOESS lines and confidence intervals go below zero, I would like to restrict the graphics to 0 and positive numbers (because negative do not make sense).

How can I do this?


回答1:


I am seconding Matt Parker's suggestion that you have to change the fitting procedure. One option that often works for positive-only data, is to do the fit on log-scale, and then exponentiate to get results on the original scale. This will guarantee positive only values.

Generating random data that has some of this issues:

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth() 

Now we can use ggplot's transformation capabilites to log-transform the y-values, but display the results on an exponential scale (which corresponds to the original one):

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

You can see examples like this on the coord_trans help page. If you don't like the y-axis, you can manipulate the breaks and labels.

EDIT based on the question update

There have been some changes in ggplot2 since the question was originally asked, and the original answer did not deal with 0's.

Option 1

The main idea of the solution is the same: find a transformation that will map the range of possible values to -Inf to Inf, do the loess smooth there, and then backtransform the result. The log-transformation would be great if there were no zeroes. I don't think the required function exists if 0 is included, but a possibility that often works is the log(1+x) transformation. That is built-in, but we need to have the inverse transformation exp(x)-1 as well.

library(scales)
#create exp(x)-1 transformation, the inverse of log(1+p)
expm1_trans <-  function() trans_new("expm1", "expm1", "log1p")

qplot(x, y, data=sites) + stat_smooth(method="loess") +
  scale_y_continuous(trans=log1p_trans()) +
  coord_trans(ytrans=expm1_trans())

Option 2

The second option extends the suggestion in the comments to Matt Parker's answer: use a regression method that incorporates the integer nature of the outcomes. That means overdispersed (just in case) Poisson regression for counts. While you can't do loess, you can do a spline fit. You can play with the degrees of freedom to control the smoothness.

library(splines)
qplot(x, y, data=sites) + stat_smooth(method="glm", family="quasipoisson", 
                                      formula = y ~ ns(x, 3))

The two options give quite similar results, which is a good thing.




回答2:


I can't test it without some example data, but

qplot(data=sites, x, y, main="Site 349")  
(p <- qplot(data = sites, x, y, xlab = "", ylab = "")) 
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) 
p1 + theme_bw() + opts(title = "Site 349") + ylim(0, foo)

(where foo is a suitable upper limit for your plot) might do the trick. Unlike in base graphics, the xlim() and ylim() commands in ggplot actually restrict the data that are used in making the plot, rather than just the plot window. It might also restrict the geom_smooth() (though I'm not certain).

Edit: After reading a bit more, you might also want to consider switching out the model that geom_smooth is using. Again, not being able to see your data is a problem. But, for example, if it's binary - you can add stat_smooth(method="glm", family="binomial") to get a logit-smoothed line. See ?stat_smooth for more.



来源:https://stackoverflow.com/questions/2777053/in-ggplot-restrict-y-to-be-0-in-loess

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!