Python - Group Dates by Month

六月ゝ 毕业季﹏ 提交于 2021-01-03 05:12:25

问题


Here's a quick problem that I, at first, dismissed as easy. An hour in, and I'm not so sure!
So, I have a list of Python datetime objects, and I want to graph them. The x-values are the year and month, and the y-values would be the amount of date objects in this list that happened in this month.
Perhaps an example will demonstrate this better (dd/mm/yyyy):

[28/02/2018, 01/03/2018, 16/03/2018, 17/05/2018] 
-> ([02/2018, 03/2018, 04/2018, 05/2018], [1, 2, 0, 1])

My first attempt tried to simply group by date and year, along the lines of:

import itertools
group = itertools.groupby(dates, lambda date: date.strftime("%b/%Y"))
graph = zip(*[(k, len(list(v)) for k, v in group]) # format the data for graphing

As you've probably noticed though, this will group only by dates that are already present in the list. In my example above, the fact that none of the dates occurred in April would have been overlooked.

Next, I tried finding the starting and ending dates, and looping over the months between them:

import datetime
data = [[], [],]
for year in range(min_date.year, max_date.year):
    for month in range(min_date.month, max_date.month):
        k = datetime.datetime(year=year, month=month, day=1).strftime("%b/%Y")
        v = sum([1 for date in dates if date.strftime("%b/%Y") == k])
        data[0].append(k)
        data[1].append(v)

Of course, this only works if min_date.month is smaller than max_date.month which is not necessarily the case if they span multiple years. Also, its pretty ugly.

Is there an elegant way of doing this?
Thanks in advance

EDIT: To be clear, the dates are datetime objects, not strings. They look like strings here for the sake of being readable.


回答1:


I suggest use pandas:

import pandas as pd

dates = ['28/02/2018', '01/03/2018', '16/03/2018', '17/05/2018'] 

s = pd.to_datetime(pd.Series(dates), format='%d/%m/%Y')
s.index = s.dt.to_period('m')
s = s.groupby(level=0).size()

s = s.reindex(pd.period_range(s.index.min(), s.index.max(), freq='m'), fill_value=0)
print (s)
2018-02    1
2018-03    2
2018-04    0
2018-05    1
Freq: M, dtype: int64

s.plot.bar()

Explanation:

  1. First create Series from list of dates and convert to_datetimes.
  2. Create PeriodIndex by Series.dt.to_period
  3. groupby by index (level=0) and get counts by GroupBy.size
  4. Add missing periods by Series.reindex by PeriodIndex created by max and min values of index
  5. Last plot, e.g. for bars - Series.plot.bar



回答2:


using Counter

dates = list()
import random
import collections

for y in range(2015,2019):
  for m in range(1,13):
    for i in range(random.randint(1,4)):
      dates.append("{}/{}".format(m,y))

print(dates)
counter = collections.Counter(dates)
print(counter)

for your problem with dates with no occurrences you can use the subtract method of Counter generate a list with all range of dates, each date will appear on the list only once, and then you can use subtract like so

tmp_date_list = ["{}/{}".format(m,y) for y in range(2015,2019) for m in range(1,13)]
counter.subtract(tmp_date_list)


来源:https://stackoverflow.com/questions/49584924/python-group-dates-by-month

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!