out of order date in ggplot2

↘锁芯ラ 提交于 2021-02-02 03:39:32

问题


I typically know how to order my dates in ggplot but something is different about this data and I'm hoping someone can clarify for me.

Consider:

ggplot(tmp3)+
geom_boxplot(aes(x=simdte,y=r2))+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

The dates are in alphanumeric order but now I want to format the x axis labels so I tried:

ggplot(tmp3)+
geom_boxplot(aes(x=reorder(strftime(strptime(simdte,'%Y%m%d'),'%b-%d'),as.numeric(simdte)),y=r2))+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

but notice that all the dates are in order EXCEPT Jun-08 in 2015.

I also tried

tmp3=
tmp3 %>%
mutate(plotsimdte=factor(strftime(strptime(simdte,'%Y%m%d'),'%b-%d'),                        levels=strftime(strptime(unique(simdte),'%Y%m%d'),'%b-%d')[order(unique(simdte))]))

and plotting with x=plotsimdte but no difference. I get a warning when I create this factor about duplicated levels which is confusing since I'm only using unique values.

Lastly, I tried

ggplot(tmp3)+
geom_boxplot(aes(x=as.Date(simdte,'%Y%m%d'),y=r2, group=simdte))+
scale_x_date(date_labels ='%b-%d')+
facet_wrap(~simyr, scales='free_x')+
theme(axis.text.x=element_text(angle=45,hjust=1))

but I'd like to keep the dates discrete because their importance is as an identifier rather than distribution through time.

Any advice would be appreciated. Thanks

A small subset of my data

EDIT: updated dput output with as.data.frame

> dput(as.data.frame(tmp3))
structure(list(mdldte = c("20130525", "20140407", "20140413", 
"20150608", "20130525", "20150608", "20140420", "20130429", "20130608", 
"20130608", "20140323", "20140413", "20150325", "20150608", "20140511", 
"20130601", "20150608", "20130608", "20140420", "20150305", "20150415", 
"20130608", "20140531", "20150608", "20140531", "20150608", "20130403", 
"20130503", "20150415", "20140407", "20150608", "20140323", "20130525", 
"20140420", "20130403", "20130403", "20130608", "20150501", "20150608", 
"20130429", "20160607", "20140527", "20140420", "20140531", "20140502", 
"20150325", "20140428", "20160620", "20160620", "20130403", "20160527", 
"20150415", "20140413", "20160607", "20140413", "20150608", "20160613", 
"20150608", "20140407", "20150501", "20140323", "20160607", "20140531", 
"20150305", "20150409", "20140428", "20130503", "20130525", "20140428", 
"20140407", "20130503", "20130525", "20130403", "20150305", "20150217", 
"20150501", "20130608", "20150305", "20150217", "20130608", "20140511", 
"20160527", "20140502", "20150415"), simdte = c("20130403", "20130403", 
"20130403", "20130429", "20130429", "20130429", "20130503", "20130503", 
"20130503", "20130525", "20130525", "20130525", "20130601", "20130601", 
"20130601", "20130608", "20130608", "20130608", "20140323", "20140323", 
"20140323", "20140407", "20140407", "20140407", "20140413", "20140413", 
"20140413", "20140420", "20140420", "20140420", "20140428", "20140428", 
"20140428", "20140502", "20140502", "20140502", "20140511", "20140511", 
"20140511", "20140517", "20140517", "20140517", "20140527", "20140527", 
"20140527", "20140531", "20140531", "20140531", "20150217", "20150217", 
"20150217", "20150305", "20150305", "20150305", "20150325", "20150325", 
"20150325", "20150409", "20150409", "20150409", "20150415", "20150415", 
"20150415", "20150427", "20150427", "20150427", "20150501", "20150501", 
"20150501", "20150608", "20150608", "20150608", "20160527", "20160527", 
"20160527", "20160607", "20160607", "20160607", "20160613", "20160613", 
"20160613", "20160620", "20160620", "20160620"), r2 = c(0.862283742909527, 
0.813142444594872, 0.700946018367384, 0.474388980021752, 0.826648311592866, 
0.794283339648572, 0.79687922855493, 0.808984929407683, 0.781751354268809, 
0.535951689307516, 0.68524477567256, 0.716321630808227, 0.373141090466726, 
0.723850452026657, 0.408972539926536, 0.29346057127035, 0.319261073048776, 
0.319535158994707, 0.872351278607699, 0.871652058666136, 0.509872096326808, 
0.398605136979609, 0.420745998256184, 0.596082529689281, 0.793035779455997, 
0.661212720614186, 0.736581215438551, 0.89337362408349, 0.900773593767951, 
0.916946297262156, 0.700865150846107, 0.839501961957186, 0.863684601286204, 
0.819367869015135, 0.765192251153536, 0.590744027549224, 0.720092636591613, 
0.732237645665246, 0.701898569000057, 0.505310296599101, 0.756344530560126, 
0.522404606955389, 0.631453896947287, 0.732767696833121, 0.669168785479052, 
0.340080390313005, 0.397681954572616, 0.708286400101956, 0.551718623201008, 
0.62217661847446, 0.160935876745664, 0.79407487647674, 0.729924604817696, 
0.716024523586796, 0.526169199415047, 0.702098331814224, 0.748626603557805, 
0.432690018453805, 0.710646849035047, 0.526049259906931, 0.811336120223548, 
0.679819505156441, 0.591396577448379, 0.656686513355743, 0.698313842140892, 
0.718604690738853, 0.768070041705958, 0.453336001102217, 0.544446423520199, 
0.583336140040845, 0.172961846412558, 0.298155303932666, 0.731010397306203, 
0.582517045429492, 0.521708072638302, 0.610885761462162, 0.543494236386099, 
0.630580819311437, 0.642714888852003, 0.736302041771047, 0.736086951074143, 
0.444437396681972, 0.445336147280364, 0.43829690520584), simyr = c("2013", 
"2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013", 
"2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013", 
"2013", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014", 
"2014", "2014", "2014", "2014", "2014", "2014", "2014", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2015", 
"2015", "2015", "2015", "2015", "2015", "2015", "2015", "2016", 
"2016", "2016", "2016", "2016", "2016", "2016", "2016", "2016", 
"2016", "2016", "2016"), mdlpreds = structure(c(4L, 2L, 3L, 1L, 
3L, 2L, 4L, 2L, 3L, 3L, 4L, 2L, 1L, 2L, 3L, 1L, 3L, 3L, 4L, 4L, 
1L, 1L, 1L, 3L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 3L, 4L, 2L, 4L, 1L, 
3L, 3L, 3L, 3L, 2L, 1L, 4L, 2L, 4L, 3L, 1L, 4L, 4L, 4L, 3L, 4L, 
2L, 2L, 1L, 3L, 3L, 1L, 3L, 2L, 2L, 3L, 3L, 4L, 4L, 3L, 2L, 1L, 
3L, 2L, 3L, 1L, 2L, 1L, 3L, 1L, 1L, 3L, 2L, 2L, 2L, 1L, 1L, 1L
), .Label = c("phv", "phvfsca", "phvaso", "phvasofsca"), class = "factor")), class = "data.frame", .Names = c("mdldte", 
"simdte", "r2", "simyr", "mdlpreds"), row.names = c(NA, -84L))

回答1:


The issue is that your dates are currently being interpreted as character data, and R is shuffling them a little. What you really want is for them to be treated as genuine Date objects, and then let ggplot's higher-level functions handle the ordering and labeling accordingly.

Convert the date data to Date type:

tmp3$newdate <- as.Date(strptime(tmp3$simdte, '%Y%m%d'))

Specify the new dates as the x-values (no need to select only the unique values), and use scale_x_date to create pretty labels. Note that this also correctly spaces the data points across time, instead of using even spacing for each "level" of the date data.

plot.new <- ggplot(tmp3)+
    geom_point(aes(x= newdate, y=r2))+
    scale_x_date(date_labels = '%b-%d') +
    facet_wrap(~simyr, scales='free_x')+
    theme(axis.text.x=element_text(angle=45,hjust=1))
print(plot.new)

In the future, it's useful to be aware of the str function, which can quickly tell you the format of your data columns (also accessible from the Environment panel in RStudio):

str(tmp3)

'data.frame':   28 obs. of  7 variables:
 $ mdldte  : chr  "20150305" "20140531" "20160620" "20150305" ...
 $ simdte  : chr  "20130403" "20130429" "20130503" "20130525" ...
 $ r2      : num  0.542 0.485 0.54 0.4 0.594 ...
 $ simyr   : chr  "2013" "2013" "2013" "2013" ...
 $ mdlyr   : chr  "2015" "2014" "2016" "2015" ...
 $ mdlpreds: Factor w/ 4 levels "phv","phvfsca",..: 1 1 1 1 4 1 4 2 3 4 ...
 $ newdate : Date, format: "2013-04-03" "2013-04-29" "2013-05-03" "2013-05-25" ...

As you can see, your original "simdte" column is being stored as character data. R (and ggplot) will treat every value of the data as a unique level or category. Conversely, Date data are fundamentally numerical. R will treat them as continuous, which makes it easier to plot them accurately on a timeline or axis. It also makes it easier to separate the underlying data from the format of any plotting labels.

Update: Using dates as categories and plotting boxplots, in date order

If instead we wanted each date to act as a category (instead of having the date data act as a numerical distance), the solution is actually simpler. Strange things happen when you try to change the number of values being fed into a ggplot aesthetic, which I suspect is the root cause of your misordering problem.

The key is to rely on ggplot's built-in labeling functions. Once again, the main call to ggplot is fed the raw data, and scale_x_discrete handles the creation of pretty labels:

plot.new <- ggplot(tmp3)+
    geom_boxplot(aes(x=simdte,y=r2))+
    facet_wrap(~simyr, scales='free_x')+
    scale_x_discrete(labels = function(x) strftime(strptime(x, '%Y%m%d'), '%b-%d'))+
    theme(axis.text.x=element_text(angle=45,hjust=1))
print(plot.new)



来源:https://stackoverflow.com/questions/40346031/out-of-order-date-in-ggplot2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!