Time-series boxplot in pandas

后端 未结 3 1847
庸人自扰
庸人自扰 2021-01-02 02:38

How can I create a boxplot for a pandas time-series where I have a box for each day?

Sample dataset of hourly data where one box should consist of 24 values:

相关标签:
3条回答
  • 2021-01-02 03:13

    (Not enough rep to comment on accepted solution, so adding an answer instead.)

    The accepted code has two small errors: (1) need to add numpy import and (2) nned to swap the x and y parameters in the boxplot statement. The following produces the plot shown.

    import numpy as np
    import pandas as pd
    import seaborn
    import matplotlib.pyplot as plt
    
    n = 480
    ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
    
    fig, ax = plt.subplots(figsize=(12,5))
    seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
    
    0 讨论(0)
  • 2021-01-02 03:20

    I have a solution that may be helpful-- It only uses native pandas and allows for hierarchical date-time grouping (i.e spanning years). The key is that if you pass a function to groupby(), it will be called on each element of the dataframe's index. If your index is a DatetimeIndex (or similar), you can access all of the dt's convenience functions for resampling!

    Try this:

    n = 480
    ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
    ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, figsize=(12,9), rot=90)
    

    0 讨论(0)
  • 2021-01-02 03:36

    If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.

    import pandas as pd
    import numpy as np
    import seaborn
    import matplotlib.pyplot as plt
    
    n = 480
    ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
    
    
    fig, ax = plt.subplots(figsize=(12,5))
    seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
    

    Which gives: enter image description here

    Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:

    ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))
    

    Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :

    [(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]
    
    0 讨论(0)
提交回复
热议问题