ggplot2 how to get 2 histograms with the y value = to count of one / sum of the count of both

独自空忆成欢 提交于 2019-12-11 18:28:11

问题


I guess my question is simple (even if the title is not...) but I was not able to find any clear answer yet. I want to plot histograms of Reaction Times in a psychophysics task. I need to plot two of them on the same figure: one for correct responses, the other for incorrect responses.

I don't want to plot the absolute counts, but rather the relative proportion corresponding to:

For correct responses: count(correct==1) / sum(count(correct==1) + count(correct==0))

For incorrect responses: count(correct==0) / sum(count(correct==1) + count(correct==0))

For now I have that:

ggplot(data, aes(x=RT, color=correct)) 
    + geom_histogram(aes(y = ..count../sum(..count..))) 
    + stat_bin(breaks = seq(5,800,by=10))

But I'm not sure it is doing what I want (is the sum corresponding to the sum of both correct and incorrect responses?). I don't feel comfortable with the ..count.. etc, would anyone have a good recommendation for documentation about this aspect?

Thanks in advance.

Edit: The input data is:

df <- structure(list(RT = c(359L, 214L, 219L, 206L, 120L, 166L, 156L, 
       181L, 135L, 122L, 110L, 101L, 139L, 215L, 106L, 217L, 162L, 135L, 
       114L, 205L), correct = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
       1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L)), .Names = c("RT", 
       "correct"), class = "data.frame", row.names = c(NA, -20L))

Here is a link to a plot I made earlier using base R which is exactly the output I want at the end. https://www.dropbox.com/s/nqn83pkoq7o0stv/RTexample.png These are lines (but based on histograms, yellow for correct==1, blue for correct==0). The specific feature that I want is that both line together sum up to 1.


回答1:


shora,

Brian Hanson is absolutely correct. You really should stop trying to do your transformation as part of the 'ggplot' function. I know it's tempting, but the in-plot transformation methods of 'ggplot' should be used more for data exploration rather than the creation of a predetermined graph. You can quickly use the 'hist' function to get the data you need, transform the data, and then feed it into 'ggplot' for the actual graphing. The best part about transforming your data manually is that you get to see all of it in action, and you won't have problems (as in your question) guessing whether or not the answers are correct.

You'll need to decide exactly how you want the two plots arranged, but that can all be done with 'ggplot'. Here is an example of an outside transformation:

Step 1: Get the histogram values for [correct]=1.

correct_Hist <- hist(data[correct==1, 1], breaks=seq(5, 800, by=10), plot=FALSE)

Step 2: Get the histogram values for [correct]=0.

incorrect_Hist <- hist(data[correct==0, 1], breaks=seq(5, 800, by=10), plot=FALSE)

Step 3: Transform the counts. Your explanation in the question is a bit ambiguous, and could be taken a couple different ways. For this answer, I am assuming you do not want a histogram but rather that you want a bar chart that shows what percentage of a specific range of RT values is represented by incorrect or correct responses. This is quite simple now that we have the counts.

correct_Bar_Values <- correct_Hist$counts / (correct_Hist$counts + incorrect_Hist$counts)
incorrect_Bar_Values <- incorrect_Hist$counts / (correct_Hist$counts + incorrect_Hist$counts)

Step 4: Plot it however you like. Now that you have the raw values you want to plot, you can use any variety of methods to get it plotted. I recommend the 'geom_bar' layer, rather than the 'geom_hist' layer, since you have already done the calculations. You'll have to also specify the two different 'grid' viewports you want 'ggplot' to use, but if you need help with that, submit a second question. This is how you can quickly make your data into a bar chart:

# The percentage of answers that were not correct
qplot(incorrect_Hist$mids,y=incorrect_Bar_Values, geom="bar", stat="identity", ylim=c(0,1))

# The percentage of answers that were correct
qplot(correct_Hist$mids,y=correct_Bar_Values, geom="bar", stat="identity", ylim=c(0,1))



回答2:


If I understand correctly, position="fill" should meet your needs:

ggplot(df,aes(x=RT,fill=factor(correct,labels=c("Incorrect","Correct")))) +
 geom_bar(breaks=seq(5,800,by=10),position="fill") +
  scale_y_continuous("",labels=percent) + scale_fill_discrete("")

One histogram is based (the zero-level) at the bottom, the other at the top.



来源:https://stackoverflow.com/questions/14522640/ggplot2-how-to-get-2-histograms-with-the-y-value-to-count-of-one-sum-of-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!