Creating a histogram using aggregated data

元气小坏坏 提交于 2019-11-28 05:08:48

问题


Embarrassingly simple question...

I'm new to R and I can't seem to wrap my head around this for some reason. I have a CSV file which looks something like this:

Bin,Number
1363,5
1028,4
1303,3
1467,1
1242,3
1415,5
..
.

The bin size is 1, with a range of 1000-1500. I have read my CSV file in, everything seems to be ok there, but I just cannot produce a simple histogram. I have tried simply using a barplot, but the data is not numerically ordered, so will not produce the output I need. Using data such as this, how can I produce a histogram in R?

Once I have a simple histogram, I'm sure I'll be able to play around with it and format it nicely.


回答1:


Because the hist function does the counting of items in each bin, you need to 'explode' your 'already counted' data, for example by using rep. Then you can use hist on the resulting vector.

with(df, hist(rep(x = Bin, times = Number)))



回答2:


While this is absolutely possible with base R, I always enjoy the elegance and simplicity of the package ggplot2.

For example, you could do the following:

library(ggplot2)
ggplot(data, aes(x=Bin, y=Number)) + geom_bar(stat='identity', width=1)

(Run install.packages('ggplot2') first, if you do not have the package installed.)




回答3:


Your data is already binned, and so the easiest way to get an R Histogram object from this data set is to use the PreBinnedHistogram function from the HistogramTools package on CRAN. This function takes a list of breakpoints (column 1 in your example) and counts of each bin (column 2) and returns a proper R histogram object for plotting or further analysis without first exploding your dataset into the unbinned form.

library(HistogramTools)
my.data<-read.csv("input.csv")
plot(PreBinnedHistogram(my.data$V1, my.data$V2))



回答4:


The key is to putting your data in the right order. Assuming your dataframe is called df:

barplot(df$Number[order(df$Bin)])

If you use barplot simply by feeding it your vector of data, it will draw the bars in the order of the vector. Using order puts them in their numeric order before plotting.



来源:https://stackoverflow.com/questions/19939073/creating-a-histogram-using-aggregated-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!