trying to display original and fitted data (nls + dnorm) with ggplot2's geom_smooth()

孤街醉人 提交于 2019-12-03 05:13:31

问题


I am exploring some data, so the first thing I wanted to do was try to fit a normal (Gaussian) distribution to it. This is my first time trying this in R, so I'm taking it one step at a time. First I pre-binned my data:

myhist = data.frame(size = 10:27, counts = c(1L, 3L, 5L, 6L, 9L, 14L, 13L, 23L, 31L, 40L, 42L, 22L, 14L, 7L, 4L, 2L, 2L, 1L) )

qplot(x=size, y=counts, data=myhist)

Since I want counts, I need to add a normalization factor (N) to scale up the density:

fit = nls(counts ~ N * dnorm(size, m, s), data=myhist, start=c(m=20, s=5, N=sum(myhist$counts)) )   

Then I create the fitted data for display and everything works great:

x = seq(10,30,0.2)
fitted = data.frame(size = x, counts=predict(fit, data.frame(size=x)) )
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_line(data=fitted)

I got excited when I found this thread which talks about using geom_smooth() to do it all in one step, but I can't get it to work:

  • http://www.mail-archive.com/r-help@r-project.org/msg109882.html

Here's what I try... and what I get:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_smooth(method="nls", formula = counts ~ N * dnorm(size, m, s), se=F, start=list(m=20, s=5, N=300, size=10))

Error in method(formula, data = data, weights = weight, ...) : 
  parameters without starting value in 'data': counts

The error seems to indicate that it's trying to fit for the observed variable, counts, but that doesn't make any sense, and it predictably freaks out if I specify a "starting" value for counts too:

fitting parameters ‘m’, ‘s’, ‘N’, ‘size’, ‘counts’ without any variables

Error in eval(expr, envir, enclos) : object 'counts' not found

Any idea what I'm doing wrong? It's not the end of the world, of course, but fewer steps are always better, and you guys always come up with the most elegant solutions to these common tasks.

Thanks in advance!

Jeffrey


回答1:


the first error indicates that ggplot2 cannot find the variable 'count', which is used in formula, in data.

Stats take place after mapping, that is, size -> x, and counts -> y.

Here is an example for using nls in geom_smooth:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + 
  geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, 
              start=list(m=20, s=5, N=300)) 

The point is that using x and y, instead of size and counts, in the specification of formula.



来源:https://stackoverflow.com/questions/4382108/trying-to-display-original-and-fitted-data-nls-dnorm-with-ggplot2s-geom-smo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!