R ggplot: Weighted CDF

戏子无情 提交于 2020-01-03 18:46:08

问题


I'd like to plot a weighted CDF using ggplot. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.

For example, consider this data:

df <- data.frame(x=sort(runif(100)), w=1:100)

I can show an unweighted CDF with

ggplot(df, aes(x)) + stat_ecdf()

How would I weight this by w? For this example, I'd expect an x^2-looking function, since the larger numbers have higher weight.


回答1:


You can calculate the cumulative distribution within the data frame itself, i.e.:

df <- df[order(df$x), ]  # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(x * w) / sum(x * w))
ggplot(df, aes(x, cum.pct)) + geom_line()




回答2:


There is a mistake in your answer.

This is the right code to compute the weighted ECDF:

df <- df[order(df$x), ]  # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(w) / sum(w))
ggplot(df, aes(x, cum.pct)) + geom_line()

The ECDF is a function F(a) equal to the sum of weights (probabilities) of observations where x<a divided by the total sum of weights.

But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted



来源:https://stackoverflow.com/questions/32487457/r-ggplot-weighted-cdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!