Pandas finding local max and min

后端 未结 3 1354
我在风中等你
我在风中等你 2020-12-04 11:43

I have a pandas data frame with two columns one is temperature the other is time.

I would like to make third and fourth columns called min and max. Each of these co

相关标签:
3条回答
  • 2020-12-04 11:57

    Assuming that the column of interest is labelled data, one solution would be

    df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
    df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
    

    For example:

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    
    # Generate a noisy AR(1) sample
    np.random.seed(0)
    rs = np.random.randn(200)
    xs = [0]
    for r in rs:
        xs.append(xs[-1]*0.9 + r)
    df = pd.DataFrame(xs, columns=['data'])
    
    # Find local peaks
    df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
    df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
    
    # Plot results
    plt.scatter(df.index, df['min'], c='r')
    plt.scatter(df.index, df['max'], c='g')
    df.data.plot()
    

    0 讨论(0)
  • 2020-12-04 12:16

    using Numpy

    ser = np.random.randint(-40, 40, 100) # 100 points
    peak = np.where(np.diff(ser) < 0)[0]
    

    or

    double_difference = np.diff(np.sign(np.diff(ser)))
    peak = np.where(double_difference == -2)[0]
    

    using Pandas

    ser = pd.Series(np.random.randint(2, 5, 100))
    peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
    peak = peak_df.index
    
    0 讨论(0)
  • 2020-12-04 12:20

    The solution offered by fuglede is great but if your data is very noisy (like the one in the picture) you will end up with lots misleading local exterims. I suggest that you use scipy.signal.argrelextrema function. argrelextrema has its own limitations but it has a cool feature where you can specify the number of points to be compared, kind of like a noise filtering algorithm. for example:

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from scipy.signal import argrelextrema
    
    # Generate a noisy AR(1) sample
    np.random.seed(0)
    rs = np.random.randn(200)
    xs = [0]
    for r in rs:
        xs.append(xs[-1]*0.9 + r)
    df = pd.DataFrame(xs, columns=['data'])
    
    n=5 # number of points to be checked before and after 
    # Find local peaks
    df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal, order=n)[0]]['data']
    df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal, order=n)[0]]['data']
    
    # Plot results
    plt.scatter(df.index, df['min'], c='r')
    plt.scatter(df.index, df['max'], c='g')
    plt.plot(df.index, df['data'])
    plt.show()
    

    Some points:

    • you might need to check the points afterwards to be sure there no points very close to each other.
    • you can play with n to filter the noisy points
    • argrelextrema returns a tuple and the [0] at the end extracts a numpy array
    0 讨论(0)
提交回复
热议问题