cumulative distribution plots python

前端未结

关注

 5  1498

刺人心 2020-12-13 02:37

I am doing a project using python where I have two arrays of data. Let\'s call them pc and pnc. I am required to plot a cumulative distribution of both of

5条回答

粉色の甜心 (楼主)

2020-12-13 03:12
Using histograms is really unnecessarily heavy and imprecise (the binning makes the data fuzzy): you can just sort all the x values: the index of each value is the number of values that are smaller. This shorter and simpler solution looks like this:
```
import numpy as np
import matplotlib.pyplot as plt

# Some fake data:
data = np.random.randn(1000)

sorted_data = np.sort(data)  # Or data.sort(), if data can be modified

# Cumulative counts:
plt.step(sorted_data, np.arange(sorted_data.size))  # From 0 to the number of data points-1
plt.step(sorted_data[::-1], np.arange(sorted_data.size))  # From the number of data points-1 to 0

plt.show()
```
Furthermore, a more appropriate plot style is indeed plt.step() instead of plt.plot(), since the data is in discrete locations.

The result is:

You can see that it is more ragged than the output of EnricoGiampieri's answer, but this one is the real histogram (instead of being an approximate, fuzzier version of it).

PS: As SebastianRaschka noted, the very last point should ideally show the total count (instead of the total count-1). This can be achieved with:
```
plt.step(np.concatenate([sorted_data, sorted_data[[-1]]]),
         np.arange(sorted_data.size+1))
plt.step(np.concatenate([sorted_data[::-1], sorted_data[[0]]]),
         np.arange(sorted_data.size+1))
```
There are so many points in data that the effect is not visible without a zoom, but the very last point at the total count does matter when the data contains only a few points.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...