问题
From pd.date_range('2016-01', '2016-05', freq='M', ).strftime('%Y-%m')
, the last month is 2016-04
, but I was expecting it to be 2016-05
. It seems to me this function is behaving like the range
method, where the end parameter is not included in the returning array.
Is there a way to get the end month included in the returning array, without processing the string for the end month?
回答1:
A way to do it without messing with figuring out month ends yourself.
pd.date_range(*(pd.to_datetime(['2016-01', '2016-05']) + pd.offsets.MonthEnd()), freq='M')
DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
'2016-05-31'],
dtype='datetime64[ns]', freq='M')
回答2:
You can use .union to add the next logical value after initializing the date_range
. It should work as written for any frequency:
d = pd.date_range('2016-01', '2016-05', freq='M')
d = d.union([d[-1] + 1]).strftime('%Y-%m')
Alternatively, you can use period_range instead of date_range
. Depending on what you intend to do, this might not be the right thing to use, but it satisfies your question:
pd.period_range('2016-01', '2016-05', freq='M').strftime('%Y-%m')
In either case, the resulting output is as expected:
['2016-01' '2016-02' '2016-03' '2016-04' '2016-05']
回答3:
For the later crowd. You can also try to use the Month-Start frequency.
>>> pd.date_range('2016-01', '2016-05', freq='MS', format = "%Y-%m" )
DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01',
'2016-05-01'],
dtype='datetime64[ns]', freq='MS')
回答4:
Include the day when specifying the dates in date_range
call
pd.date_range('2016-01-31', '2016-05-31', freq='M', ).strftime('%Y-%m')
array(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05'],
dtype='|S7')
回答5:
I dont think so. You need to add the (n+1) boundary
pd.date_range('2016-01', '2016-06', freq='M' ).strftime('%Y-%m')
The start and end dates are strictly inclusive. So it will not generate any dates outside of those dates if specified. http://pandas.pydata.org/pandas-docs/stable/timeseries.html
Either way, you have to manually add some information. I believe adding just one more month is not a lot of work.
回答6:
The explanation for this issue is that the function pd.to_datetime()
converts a '%Y-%m'
date string by default to the first of the month datetime, or '%Y-%m-01'
:
>>> pd.to_datetime('2016-05')
Timestamp('2016-05-01 00:00:00')
>>> pd.date_range('2016-01', '2016-02')
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
'2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
'2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
'2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
'2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
'2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01'],
dtype='datetime64[ns]', freq='D')
Then everything follows from that. Specifying freq='M'
includes month ends between 2016-01-01 and 2016-05-01, which is the list you receive and excludes 2016-05-31. But specifying month starts 'MS'
like the second answer provides, includes 2016-05-01 as it falls within the range. pd.date_range()
default behavior isn't like the range
method since ends are included. From the docs:
closed controls whether to include start and end that are on the boundary. The default includes boundary points on either end.
来源:https://stackoverflow.com/questions/37890391/how-to-include-end-date-in-pandas-date-range-method