Scatter plot in ggplot, one numeric variable across two groups

折月煮酒 提交于 2019-12-12 02:13:27

问题


I would like to create a scatter plot in ggplot2 which displays male test_scores on the x-axis and female test_scores on the y-axis using the dataset below. I can easily create a geom_line plot splitting male and female and putting the date ("dts") on the x-axis.

library(tidyverse)

#create data

dts <- c("2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05",
"2011-01-02","2011-01-02","2011-01-03","2011-01-04","2011-01-05")

sex <- c("M","F","M","F","M","F","M","F","M","F")

test <- round(runif(10,.5,1),2)

semester <- data.frame("dts" = as.Date(dts), "sex" = sex, "test_scores" = 
test)

#show the geom_line plot
ggplot(semester, aes(x = dts, y = test, color = sex)) + geom_line()

It seems with only one time series, ggplot2 does better with the data in wide format than long format. For instance, I could easily create two columns, "male_scores" and "female_scores" and plot those against each other, but I would like to keep my data tidy and in long format.

Cheers and thank you.


回答1:


You've over-tidied. Tidying data isn't just the mechanism of making it as long as possible, its making it as wide as necessary..

For example, if you had location as X and Y for animal sightings you wouldn't have two rows, one with a "label" column containing "X" and the X coordinate in a "value" column and another with "Y" in the "label" column and the Y coordinate in the "value" column - unless you really where storing the data in a key-value store but that's another story...

Widen your data and put the test scores for male and female into test_core_male and test_score_female, then they are the x and y aesthetics for your scatter plot.




回答2:


The problem with keeping the data long is that you will not have a corresponding X value a given Y value. The reason for this is the structure of the dataset --

         dts  sex  test_scores
1 2011-01-02   M        0.67
2 2011-01-02   F        0.78
3 2011-01-03   M        0.58
4 2011-01-04   F        0.58
5 2011-01-05   M        0.51

If ypu were to use the code --

ggplot(semester, aes(x = semester$test_scores[semester$sex=='M',] ,
                     y =  semester$test_scores[semester$sex=='F',], 
                     color = sex)) + geom_point()

GGplot will kick an error. The main reason is by subsetting the male score there are no corresponding female scores for that subset. You need to first collapse the data down to a date level. As you correctly point out this isn't in a long format at that point.

I would recommend for this one off plot creating a wide dataset. There are multiple ways of doing that, but that is a different topic.



来源:https://stackoverflow.com/questions/41618041/scatter-plot-in-ggplot-one-numeric-variable-across-two-groups

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!