How to calculate the area of ellipse drawn by ggplot2?

霸气de小男生 提交于 2020-03-18 12:48:33

问题


In ggplot2, after I drawing the ellipse plot using stat_ellipse, is there any way to calculate the area of this ellipse? Here is the code and the plot:

library(ggplot2)
set.seed(1234)
x <- rnorm (1:1000)
y <- rnorm (1:1000)
data <- cbind(x, y)
data <- as.data.frame(data)
ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse()


回答1:


You can calculate the area of the ellipse by finding its semi-major and semi-minor axes (as shown in this SO answer):

# Plot object
p = ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.

# Get ellipse coordinates from plot
pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]

# Center of ellipse
ctr = MASS::cov.trob(el)$center  # Per @Roland's comment

# Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
# These are, respectively, the largest and smallest values of dist2center. 
pi*min(dist2center)*max(dist2center)

[1] 13.82067



回答2:


The area can be directly calculated from the covariance matrix by calculating the eigenvalues first.

You need to scale the variances / eigenvalues by the factor of confidence that you want to get.

This thread is very helpful

set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

cov_dat <- cov(dat) # covariance matrix

eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix

vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse

pi * vec[1] * vec[2]  
#> [1] 18.38858

Created on 2020-02-27 by the reprex package (v0.3.0)

In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814

Created on 2020-02-27 by the reprex package (v0.3.0)

The calculated value is different from user eipi10's answer. This is likely due to the different calculation under the hood, with different assumptions on the underlying distribution. see this thread.



来源:https://stackoverflow.com/questions/38782051/how-to-calculate-the-area-of-ellipse-drawn-by-ggplot2

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!