How to plot and work with NaN values in matplotlib

前端 未结 2 1138
夕颜
夕颜 2020-12-08 07:44

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is

2条回答
  •  南方客
    南方客 (楼主)
    2020-12-08 08:20

    If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

    Using pandas to "forward-fill" gaps

    One option is to use pandas fillna with a limited amount of fill values.

    As a quick example of how this works:

    In [1]: import pandas as pd; import numpy as np
    
    In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])
    
    In [3]: x.fillna(method='ffill', limit=1)
    Out[3]:
    0     1
    1     1
    2     2
    3     2
    4   NaN
    5     3
    6     3
    7   NaN
    8   NaN
    9     4
    dtype: float64
    
    In [4]: x.fillna(method='ffill', limit=2)
    Out[4]:
    0     1
    1     1
    2     2
    3     2
    4     2
    5     3
    6     3
    7     3
    8   NaN
    9     4
    dtype: float64
    

    As an example of using this for something similar to your case:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    np.random.seed(1977)
    
    x = np.random.normal(0, 1, 1000).cumsum()
    
    # Set every third value to NaN
    x[::3] = np.nan
    
    # Set a few bigger gaps...
    x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan
    
    # Use pandas with a limited forward fill
    # You may want to adjust the `limit` here. This will fill 2 nan gaps.
    filled = pd.Series(x).fillna(limit=2, method='ffill')
    
    # Let's plot the results
    fig, axes = plt.subplots(nrows=2, sharex=True)
    axes[0].plot(x, color='lightblue')
    axes[1].plot(filled, color='lightblue')
    
    axes[0].set(ylabel='Original Data')
    axes[1].set(ylabel='Filled Data')
    
    plt.show()
    

    Using numpy to interpolate gaps

    Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

    Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

    As an example, let's define an interpolate_gaps function:

    def interpolate_gaps(values, limit=None):
        """
        Fill gaps using linear interpolation, optionally only fill gaps up to a
        size of `limit`.
        """
        values = np.asarray(values)
        i = np.arange(values.size)
        valid = np.isfinite(values)
        filled = np.interp(i, i[valid], values[valid])
    
        if limit is not None:
            invalid = ~valid
            for n in range(1, limit+1):
                invalid[:-n] &= invalid[n:]
            filled[invalid] = np.nan
    
        return filled
    

    Note that we'll get interpolated value, unlike the previous pandas version:

    In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]
    
    In [12]: interpolate_gaps(values, limit=1)
    Out[12]:
    array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
            3.        ,         nan,         nan,  3.75      ,  4.        ])
    

    In the plotting example, if we replace the line:

    filled = pd.Series(x).fillna(limit=2, method='ffill')
    

    With:

    filled = interpolate_gaps(x, limit=2)
    

    We'll get a visually identical plot:

    As a complete, stand-alone example:

    import numpy as np
    import matplotlib.pyplot as plt
    np.random.seed(1977)
    
    def interpolate_gaps(values, limit=None):
        """
        Fill gaps using linear interpolation, optionally only fill gaps up to a
        size of `limit`.
        """
        values = np.asarray(values)
        i = np.arange(values.size)
        valid = np.isfinite(values)
        filled = np.interp(i, i[valid], values[valid])
    
        if limit is not None:
            invalid = ~valid
            for n in range(1, limit+1):
                invalid[:-n] &= invalid[n:]
            filled[invalid] = np.nan
    
        return filled
    
    x = np.random.normal(0, 1, 1000).cumsum()
    
    # Set every third value to NaN
    x[::3] = np.nan
    
    # Set a few bigger gaps...
    x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan
    
    # Interpolate small gaps using numpy
    filled = interpolate_gaps(x, limit=2)
    
    # Let's plot the results
    fig, axes = plt.subplots(nrows=2, sharex=True)
    axes[0].plot(x, color='lightblue')
    axes[1].plot(filled, color='lightblue')
    
    axes[0].set(ylabel='Original Data')
    axes[1].set(ylabel='Filled Data')
    
    plt.show()
    

    Note: I originally completely mis-read the question. See version history for my original answer.

提交回复
热议问题