Calculating peaks in histograms or density functions

雨燕双飞 提交于 2020-01-22 12:20:05

问题


There seem to be a lot of "peaks in density function" threads already, but I don't see one addressing this point specifically. Sorry to duplicate if I missed it.

My problem: Given a vector of 1000 values (sample attached), I would like to identify the peaks in the histogram or density function of the data. From the image of the sample data below , I can see peaks in the histogram at ~0, 6200, and 8400. But I need the obtain the exact values of these peaks, preferably in a simple procedure as I have several thousand of these vectors to process.

I originally started working with the histogram outputs themselves, but couldn't get any peak-finding command to work properly (like, not at all). I'm not even sure how it would get the peaks() command from the splus2R package to work on histogram object or on a density object. This would still be my preference, as I would like to identify the exact data value of the max frequency of each peak (as opposed to the density function value, which is slightly different), but I can't figure that one out either.

I would post the sample data themselves, but I can't see a way to do that on here (sorry if I'm just missing it).


回答1:


If your y values are smooth (like in your sample plot), this should find the peaks pretty repeatably:

peakx <- x[which(diff(sign(diff(y)))==-2)]



回答2:


Since you are thinking about histograms, maybe you should use the histogram output directly?

data <- c(rnorm(100,mean=20),rnorm(100,mean=12))

peakfinder <- function(d){
  dh <- hist(d,plot=FALSE)
  ins <- dh[["intensities"]]
  nbins <- length(ins)
  ss <- which(rank(ins)%in%seq(from=nbins-2,to=nbins)) ## pick the top 3 intensities
  dh[["mids"]][ss]
}

peaks <- peakfinder(data)

hist(data)
sapply(peaks,function(x) abline(v=x,col="red"))

This isn't perfect -- for example, it will find just the top bins, even if they are adjacent. Maybe you could define 'peak' more precisely? Hope that helps.




回答3:


Finding Peaks in density functions is, as already given in the comments, related to Finding local maxima and minima where you can find more solutions. The answer of chthonicdaemon is close to the peak, but each diff is reducing the vector length by one.

#Create Dataset
x <- c(1,1,4,4,9)

#Estimate Density
d <- density(x)

#Two ways to get highest Peak
d$x[d$y==max(d$y)]  #Gives you all highest Peaks
d$x[which.max(d$y)] #Gives you the first highest Peak

#3 ways to get all Peaks
d$x[c(F, diff(diff(d$y)>=0)<0)] #This detects also a plateau
d$x[c(F, diff(sign(diff(d$y)))<0)]
d$x[which(diff(sign(diff(d$y)))<0)+1]

#In case you also want the height of the peaks
data.frame(d[c("x", "y")])[c(F, diff(diff(d$y)>=0)<0),]

#In case you need a higher "precision"
d <- density(x, n=1e4)


来源:https://stackoverflow.com/questions/13133297/calculating-peaks-in-histograms-or-density-functions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!