How to plot and work with NaN values in matplotlib

前端未结

关注

 2  1148

夕颜 2020-12-08 07:44

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is

2条回答

南方客 (楼主)

2020-12-08 08:20

If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

Using `pandas` to "forward-fill" gaps

One option is to use pandas fillna with a limited amount of fill values.

As a quick example of how this works:

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])

In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0     1
1     1
2     2
3     2
4   NaN
5     3
6     3
7   NaN
8   NaN
9     4
dtype: float64

In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8   NaN
9     4
dtype: float64

As an example of using this for something similar to your case:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

Using `numpy` to interpolate gaps

Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

As an example, let's define an interpolate_gaps function:

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

Note that we'll get interpolated value, unlike the previous pandas version:

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]

In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
        3.        ,         nan,         nan,  3.75      ,  4.        ])

In the plotting example, if we replace the line:

filled = pd.Series(x).fillna(limit=2, method='ffill')

With:

filled = interpolate_gaps(x, limit=2)

We'll get a visually identical plot:

As a complete, stand-alone example:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

Note: I originally completely mis-read the question. See version history for my original answer.

0 讨论(0)

查看其它2个回答

How to plot and work with NaN values in matplotlib

Using pandas to "forward-fill" gaps

Using numpy to interpolate gaps

Using `pandas` to "forward-fill" gaps

Using `numpy` to interpolate gaps