dates seem to be a tricky thing in python, and I am having a lot of trouble simply stripping the date out of the pandas TimeStamp. I would like to get from 2013-09-29
Maybe this only came in recently, but there are built-in methods for this. Try:
In [27]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=2))
In [28]: s
Out[28]:
0 2016-02-11 19:11:43.386016
1 2016-02-12 19:11:43.386016
dtype: datetime64[ns]
In [29]: s.dt.to_pydatetime()
Out[29]:
array([datetime.datetime(2016, 2, 11, 19, 11, 43, 386016),
datetime.datetime(2016, 2, 12, 19, 11, 43, 386016)], dtype=object)
well I would do this way.
pdTime =pd.date_range(timeStamp, periods=len(years), freq="D")
pdTime[i].strftime('%m-%d-%Y')
map over the elements:
In [239]: from operator import methodcaller
In [240]: s = Series(date_range(Timestamp('now'), periods=2))
In [241]: s
Out[241]:
0 2013-10-01 00:24:16
1 2013-10-02 00:24:16
dtype: datetime64[ns]
In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0 01-10-2013
1 02-10-2013
dtype: object
In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0 01-10-2013
1 02-10-2013
dtype: object
You can get the raw datetime.date objects by calling the date() method of the Timestamp elements that make up the Series:
In [249]: s.map(methodcaller('date'))
Out[249]:
0 2013-10-01
1 2013-10-02
dtype: object
In [250]: s.map(methodcaller('date')).values
Out[250]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Yet another way you can do this is by calling the unbound Timestamp.date method:
In [273]: s.map(Timestamp.date)
Out[273]:
0 2013-10-01
1 2013-10-02
dtype: object
This method is the fastest, and IMHO the most readable. Timestamp is accessible in the top-level pandas module, like so: pandas.Timestamp. I've imported it directly for expository purposes.
The date attribute of DatetimeIndex objects does something similar, but returns a numpy object array instead:
In [243]: index = DatetimeIndex(s)
In [244]: index
Out[244]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-10-01 00:24:16, 2013-10-02 00:24:16]
Length: 2, Freq: None, Timezone: None
In [246]: index.date
Out[246]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
For larger datetime64[ns] Series objects, calling Timestamp.date is faster than operator.methodcaller which is slightly faster than a lambda:
In [263]: f = methodcaller('date')
In [264]: flam = lambda x: x.date()
In [265]: fmeth = Timestamp.date
In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T'))
In [267]: s2
Out[267]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
5 2001-01-01 00:05:00
6 2001-01-01 00:06:00
7 2001-01-01 00:07:00
8 2001-01-01 00:08:00
9 2001-01-01 00:09:00
10 2001-01-01 00:10:00
11 2001-01-01 00:11:00
12 2001-01-01 00:12:00
13 2001-01-01 00:13:00
14 2001-01-01 00:14:00
...
999985 2002-11-26 10:25:00
999986 2002-11-26 10:26:00
999987 2002-11-26 10:27:00
999988 2002-11-26 10:28:00
999989 2002-11-26 10:29:00
999990 2002-11-26 10:30:00
999991 2002-11-26 10:31:00
999992 2002-11-26 10:32:00
999993 2002-11-26 10:33:00
999994 2002-11-26 10:34:00
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [269]: timeit s2.map(f)
1 loops, best of 3: 1.04 s per loop
In [270]: timeit s2.map(flam)
1 loops, best of 3: 1.1 s per loop
In [271]: timeit s2.map(fmeth)
1 loops, best of 3: 968 ms per loop
Keep in mind that one of the goals of pandas is to provide a layer on top of numpy so that (most of the time) you don't have to deal with the low level details of the ndarray. So getting the raw datetime.date objects in an array is of limited use since they don't correspond to any numpy.dtype that is supported by pandas (pandas only supports datetime64[ns] [that's nanoseconds] dtypes). That said, sometimes you need to do this.
You can try using .dt.date on datetime64[ns] of the dataframe.
For e.g. df['Created_date'] = df['Created_date'].dt.date
Input dataframe named as test_df:
print(test_df)
Result:
Created_date
0 2015-03-04 15:39:16
1 2015-03-22 17:36:49
2 2015-03-25 22:08:45
3 2015-03-16 13:45:20
4 2015-03-19 18:53:50
Checking dtypes:
print(test_df.dtypes)
Result:
Created_date datetime64[ns]
dtype: object
Extracting date and updating Created_date column:
test_df['Created_date'] = test_df['Created_date'].dt.date
print(test_df)
Result:
Created_date
0 2015-03-04
1 2015-03-22
2 2015-03-25
3 2015-03-16
4 2015-03-19