I wrote a code that reads multiple files, however on some of my files datetime swaps day & month whenever the day is less than 13, and any day that is from day 13 or abo
Well I solved my problem but in a memory consuming method, I split my tmp column first to a date and time columns then I re-split my date column to day month and year, that way I could look for the days that are less than 13 and replace them with the correspondent month
df['tmp'] = pd.to_datetime(df['tmp'], unit='ns')
df['tmp'] = df['tmp'].apply(lambda x: x.replace(microsecond=0))
df['date'] = [d.date() for d in df['tmp']]
df['time'] = [d.time() for d in df['tmp']]
df[['year','month','day']] = df['date'].apply(lambda x: pd.Series(x.strftime("%Y-%m-%d").split("-")))
df['day'] = pd.to_numeric(df['day'], errors='coerce')
df['month'] = pd.to_numeric(df['month'], errors='coerce')
df['year'] = pd.to_numeric(df['year'], errors='coerce')
#Loop to look for days less than 13 and then swap the day and month
for index, d in enumerate(df['day']):
if(d <13):
df.loc[index,'day'],df.loc[index,'month']=df.loc[index,'month'],df.loc[index,'day']
# convert series to string type in order to merge them
df['day'] = df['day'].astype(str)
df['month'] = df['month'].astype(str)
df['year'] = df['year'].astype(str)
df['date']= pd.to_datetime(df[['year', 'month', 'day']])
df['date'] = df['date'].astype(str)
df['time'] = df['time'].astype(str)
# merge time and date and place result in our column
df['tmp'] =pd.to_datetime(df['date']+ ' '+df['time'])
# drop the added columns
df.drop(df[['date','year', 'month', 'day','time']], axis=1, inplace = True)
You can use the dayfirst
parameter in pd.to_datetime
.
pd.to_datetime(df.tmp, dayfirst=True)
Output:
0 2015-06-11 00:56:55
1 2015-06-11 04:16:38
2 2015-06-12 16:13:30
3 2015-06-12 21:24:03
4 2015-06-13 02:31:44
5 2015-06-13 02:37:49
Name: tmp, dtype: datetime64[ns]
I ran into the same issue. In my case the dates were the index column (called "Date"). The above mentioned solution using to_datetime() directly on the dataframe with index column "Date" didn't work for me. I had to use read_csv() first without setting the index to "Date", then apply to_datetime() on it and only then set the index to "Date".
df= pd.read_csv(file, parse_dates=True)
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df = df.set_index('Date')