ggplot2 density of circular data

橙三吉。 提交于 2021-02-07 14:13:14

问题


I have a data set where x represents day of year (say birthdays) and I want to create a density graph of this. Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2 to make a density plot.

Easy enough at first:

require(ggplot2); require(dplyr)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender)))

However, this gives a poor estimate because of edge effects. I want to apply the fact that I can use circular coordinates so that 365 + 1 = 1 -- one day after December 31st is January 1st. I know that the circular package provides this functionality, but I haven't had any success implementing it using a stat_function() call. It's particularly useful for me to use ggplot2 because I want to be able to use facets, aes calls, etc.

Also, for clarification, I would like something that looks like geom_density -- I am not looking for a polar plot like the one shown at: Circular density plot using ggplot2.


回答1:


To remove the edge effects you could stack three copies of the data, create the density estimate, and then show the density only for the middle copy of data. That will guarantee "wrap around" continuity of the density function from one edge to the other.

Below is an example comparing your original plot with the new version. I've used the adjust parameter to set the same bandwidth between the two plots. Note also that in the circularized version, you'll need to renormalize the densities if you want them to add to 1:

set.seed(105)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))

# Stack three copies of the data, with adjusted values of bday
bdays = bind_rows(bdays, bdays, bdays)
bdays$bday = bdays$bday + rep(c(0,365,365*2),each=100)

# Function to adjust bandwidth of density plot
# Source: http://stackoverflow.com/a/24986121/496488
bw = function(b,x) b/bw.nrd0(x)

# New "circularized" version of plot
bdays %>% ggplot(aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(10, bdays$bday[1:100])) +
  coord_cartesian(xlim=c(365, 365+365+1), expand=0) +
  scale_x_continuous(breaks=seq(366+89, 366+365, 90), labels=seq(366+89, 366+365, 90)-365) +
  scale_y_continuous(limits=c(0,0.0016))
  ggtitle("Circularized")

# Original plot
ggplot(bdays[1:100,], aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(30, bdays$bday[1:100])) +
  scale_x_continuous(breaks=seq(90,360,90), expand=c(0,0)) +
  ggtitle("Not Circularized")



来源:https://stackoverflow.com/questions/36266402/ggplot2-density-of-circular-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!