How to separate the two leftmost bins of a histogram in R

耗尽温柔 提交于 2019-11-29 14:05:06

The best way is to set the breaks argument manually. Using the data from your code,

hist(dataset,breaks=rep(1:7,each=2)+c(-.4,.4))

gives the following plot:

The first part, rep(1:7,each=2), is what numbers you want the bars centered around. The second part controls how wide the bars are; if you change it to c(-.49,.49) they'll almost touch, if you change it to c(-.3,.3) you get narrower bars. If you set it to c(-.5,.5) then R yells at you because you aren't allowed to have the same number in your breaks vector twice.

Why does this work?

If you split up the breaks vector, you get one part that looks like this:

> rep(1:7,each=2)
 [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7

and a second part that looks like this:

> c(-.4,.4)
 [1] -0.4  0.4

When you add them together, R loops through the second vector as many times as needed to make it as long as the first vector. So you end up with

  1-0.4  1+0.4  2-0.4  2+0.4  3-0.4  3+0.4 [etc.]
=   0.6    1.4    1.6    2.4    2.6    3.4 [etc.]

Thus, you have one bar from 0.6 to 1.4--centered around 1, with width 2*.4--another bar from 1.6 to 2.4 centered around 2 with with 2*.4, and so on. If you had data in between (e.g. 2.5) then the histogram would look kind of silly, because it would create a bar from 2.4 to 2.6, and the bar widths would not be even (since that bar would only be .2 wide, while all the others are .8). But with only integer values that's not a problem.

You need six bars NOT seven bars; that is what your histogram has space for. But then you end up generating seven bars. That is the bug.

do sample(1:6, 1000, replace=T) instead of sample(1:7, 1000, replace=T)

If you do need seven bars, then seed with 0

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!