cumulative distribution plots python

前端 未结 5 1498
刺人心
刺人心 2020-12-13 02:37

I am doing a project using python where I have two arrays of data. Let\'s call them pc and pnc. I am required to plot a cumulative distribution of both of

5条回答
  •  粉色の甜心
    2020-12-13 03:12

    Using histograms is really unnecessarily heavy and imprecise (the binning makes the data fuzzy): you can just sort all the x values: the index of each value is the number of values that are smaller. This shorter and simpler solution looks like this:

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Some fake data:
    data = np.random.randn(1000)
    
    sorted_data = np.sort(data)  # Or data.sort(), if data can be modified
    
    # Cumulative counts:
    plt.step(sorted_data, np.arange(sorted_data.size))  # From 0 to the number of data points-1
    plt.step(sorted_data[::-1], np.arange(sorted_data.size))  # From the number of data points-1 to 0
    
    plt.show()
    

    Furthermore, a more appropriate plot style is indeed plt.step() instead of plt.plot(), since the data is in discrete locations.

    The result is:

    enter image description here

    You can see that it is more ragged than the output of EnricoGiampieri's answer, but this one is the real histogram (instead of being an approximate, fuzzier version of it).

    PS: As SebastianRaschka noted, the very last point should ideally show the total count (instead of the total count-1). This can be achieved with:

    plt.step(np.concatenate([sorted_data, sorted_data[[-1]]]),
             np.arange(sorted_data.size+1))
    plt.step(np.concatenate([sorted_data[::-1], sorted_data[[0]]]),
             np.arange(sorted_data.size+1))
    

    There are so many points in data that the effect is not visible without a zoom, but the very last point at the total count does matter when the data contains only a few points.

提交回复
热议问题