pandas scatter plotting datetime

后端 未结 5 2056
长情又很酷
长情又很酷 2020-12-03 03:01

I have a dataframe with two columns of datetime.time\'s. I\'d like to scatter plot them. I\'d also like the axes to display the times, ideally. But

df.plo         


        
相关标签:
5条回答
  • 2020-12-03 03:07

    Here's a basic work around to get you started.

    import matplotlib, datetime
    import matplotlib.pyplot as plt
    
    def scatter_date(df, x, y, datetimeformat):
      if not isinstance(y, list):
          y = [y]
      for yi in y:
          plt.plot_date(df[x].apply(
              lambda z: matplotlib.dates.date2num(
                  datetime.datetime.strptime(z, datetimeformat))), df[yi], label=yi)
      plt.legend()
      plt.xlabel(x)
    
    # Example Usage
    scatter_date(data, x='date', y=['col1', 'col2'], datetimeformat='%Y-%m-%d')
    
    0 讨论(0)
  • 2020-12-03 03:20

    Not an answer, but I can't edit the question or put this much in a comment, I think.

    Here is a reproducible example:

    from datetime import datetime
    import pandas as pd
    df = pd.DataFrame({'x': [datetime.now() for _ in range(10)], 'y': range(10)})
    df.plot(x='x', y='y', kind='scatter')
    

    This gives KeyError: 'x'.

    Interestingly, you do get a plot with just df.plot(x='x', y='y'); it chooses poorly for the default x range because the times are just nanoseconds apart, which is weird, but that's a separate issue. It seems like if you can make a line graph, you should be able to make a scatterplot too.

    There is a pandas github issue about this problem, but it was closed for some reason. I'm going to go comment there and see if we can re-start that conversation.

    Is there some clever work-around for this? If so, what?

    0 讨论(0)
  • 2020-12-03 03:24

    building on Mike N's answer...convert to unix time to scatter properly, then transform your axis labels back from int64s to strings:

    type(df.ts1[0])
    

    pandas.tslib.Timestamp

    df['t1'] = df.ts1.astype(np.int64)
    df['t2'] = df.ts2.astype(np.int64)
    
    fig, ax = plt.subplots(figsize=(10,6))
    df.plot(x='t1', y='t2', kind='scatter', ax=ax)
    ax.set_xticklabels([datetime.fromtimestamp(ts / 1e9).strftime('%H:%M:%S') for ts in ax.get_xticks()])
    ax.set_yticklabels([datetime.fromtimestamp(ts / 1e9).strftime('%H:%M:%S') for ts in ax.get_yticks()])
    plt.show()
    

    0 讨论(0)
  • 2020-12-03 03:27

    Not a real answer but a workaround, as suggested by Tom Augspurger, is that you can just use the working line plot type and specify dots instead of lines:

    df.plot(x='x', y='y', style=".")
    
    0 讨论(0)
  • 2020-12-03 03:28

    It's not pretty, but as a quick hack you can convert your DateTime to a timestamp using .timestamp() before loading into Pandas and scatters will work just fine (although a completely unusable x-axis).

    0 讨论(0)
提交回复
热议问题