Why is the first bar so big in my R histogram?

你。 提交于 2021-02-09 08:44:05

问题


I'm playing around with R. I try to visualize the distribution of 1000 dice throws with the following R script:

cases <- 1000

min <- 1
max <- 6

x <- as.integer(runif(cases,min,max+1))
mx <- mean(x)
sd <- sd(x)

hist(
  x,
  xlim=c(min - abs(mx/2),max + abs(mx/2)),
  main=paste(cases,"Samples"),
  freq = FALSE,
  breaks=seq(min,max,1)
)

curve(dnorm(x, mx, sd), add = TRUE, col="blue", lwd = 2)
abline(v = mx, col = "red", lwd = 2)

legend("bottomleft", 
       legend=c(paste('Mean (', mx, ')')), 
       col=c('red'), lwd=2, lty=c(1))

The script produces the following histogram:

Can someone explain to me why the first bar is so big? I've checked the data and it looks fine. How can I fix this?

Thank you in advance!


回答1:


Histograms aren't good for discrete data, they're designed for continuous data. Your data looks something like this:

> table(x)
x
  1   2   3   4   5   6 
174 138 162 178 196 152 

i.e. roughly equal numbers of each value. But when you put that in a histogram, you chose breakpoints at 1:6. The first bar has 174 entries on its left limit, and 138 on its right limit, so it displays 312.

You could get a better looking histogram by specifying breaks at the half integers, i.e. breaks = 0:6 + 0.5, but it still doesn't make sense to be using a histogram for data like this. Simply running plot(table(x)) or barplot(table(x)) gives a more accurate depiction of the data.




回答2:


You have incorrect breaks and because of this, the first bar is counting 1 and 2's in the roll.

hist(
  x,
  xlim=c(0,6),
  main=paste(cases,"Samples"),
  freq = FALSE,
  breaks=seq(0,6,1)
)



回答3:


m0nhawk gets to part of the problem. Another issue might be your use of as.integer, which always rounds down (and therefore skews toward 1).

as.integer(1.7)
# 1

round(1.7)
# 2

Lastly, I'm not sure why one would fit a gaussian to a uniform distribution. Generating the numbers from rnorm, rather than runif, would make more sense.



来源:https://stackoverflow.com/questions/43967838/why-is-the-first-bar-so-big-in-my-r-histogram

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!