How can I create a histogram from aggregated data in R?

不想你离开。 提交于 2019-11-27 03:24:32

问题


I have a data frame that has a format like the following:

Month       Frequency
2007-08     2
2010-11     5
2011-01     43
2011-02     52
2011-03     31
2011-04     64
2011-05     73

I would like to create a histogram from this data, using X bins (X will probably be around 15, but the actual data has over 200 months), and using the data from the frequency column as the frequency for each bin of the histogram. How can I accomplish this?

I've tried two approaches so far, with the hist() and barplot() commands. The problem with hist() is that it does not seem to give me any way to specify that I want to use the frequency column in the frequency calculations for the histogram. The problem with barplot() is that I don't have any flexibility in choosing X bins, and if there are omitted months, then resulting graph is not actually a true histogram because the x-axis isn't continuous.

The only idea I have right now is to go with the barplot() approach, fill in the missing months with a value of 0 for Frequency, and use space=0 to remove the spacing between the bars. The problem with that is that it's not particularly easy to choose an arbitrary number of bins.


回答1:


To get this kind of flexibility, you may have to replicate your data. Here is one way of doing it with rep:

n <- 10
dat <- data.frame(
    x = sort(sample(1:50, n)),
    f = sample(1:100, n))
dat

expdat <- dat[rep(1:n, times=dat$f), "x", drop=FALSE]

Now you have your data replicated in the data.frame expdat, allowing you to call hist with different numbers of bins:

par(mfcol=c(1, 2))
hist(expdat$x, breaks=50, col="blue", main="50 bins")
hist(expdat$x, breaks=5, col="blue", main="5 bins")
par(mfcol=c(1, 1))




回答2:


take a gander at ggplot2.

if you data is in a data.frame called df:

ggplot(df,aes(x=Month,y=Frequency))+geom_bar(stat='identity')

or if you want continuous time:

df$Month<-as.POSIXct(paste(df$Month, '01', sep='-'),format='%Y-%m-%d')
ggplot(df,aes(x=Month,y=Frequency))+geom_bar(stat='identity')



回答3:


Yea, rep solutions will waste too much memory in most interesting/large cases. The HistogramTools CRAN package includes an efficient PreBinnedHistogram function which creates a base R histogram object directly from a list of bins and breaks as the original question provided.




回答4:


Another possibility is to scale down your frequency variable by some large factor so that rep doesn't have as much work to do. Then adjust the vertical axis scale of the histogram by that same factor.



来源:https://stackoverflow.com/questions/9133057/how-can-i-create-a-histogram-from-aggregated-data-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!