Import excel time into Pandas with decimal seconds

问题

I have an excel spread sheet (.xls) that contains a time column. The time is displayed in Excel as minutes:seconds.tenths of seconds. Such as "50:59.2" "50:59.4". The raw data contains hours:minutes:seconds.decimalseconds.

When I import the data into pandas I am loosing the tenths of a second:

indata=pd.read_excel('Data.xls','Tabular Data',header=9,skiprows=[1,2,3,4,5,6,7,8,10,11,12])
indata['Time']
0     17:50:59
1     17:51:00
2     17:51:00
3     17:51:00
...
indata.Time[0].microsecond
0
indata.Time[1].microsecond
0

I also tried to use: pd.ExcelFile() with xls.parse but got the same results. Is there any way to control how Pandas parses the time from Excel? It is getting it "correct" in terms of the hours, minutes and seconds, but it is dropping the tens, which I do need.

ADDITIONAL INFORMATION:

As a test, I also just tried to use xlrd to directly read the data. It does read in the time as floats, as expected. But, if I then try to use xlrd.xldate_as_tuple() on some of the time data, I loose the fractions of a second. While if I instead directly use datetime.timedelta(), I see the decimal seconds.
Perhaps the problem is that xlrd is dropping the data?

WORK AROUND:

I figured out a work around. It doesn't solve the underlying problem, but it does allow me to read in the data.
I opened the spreadsheet in Excel and created a new column of time that is text-only based on the time (named Time_str): =TEXT(A13,"h:mm:ss.0")
And saved it. Then I was able to use pd.read_excel to read in the spreadsheet.
Finally, I converted this new column to a time in Pandas like this: indata_t['Time2']=indata_t.Time_str.apply(lambda x: datetime.datetime.strptime(x,'%H:%M:%S.%f'))

Or, adding in a date like this: indata_t['Time2']=indata_t.Time_str.apply(lambda x: datetime.datetime.strptime('2009-01-11 '+x,'%Y-%m-%d %H:%M:%S.%f')) It is a kludge, but at least it let me import the data.

回答1:

Pandas used xlrd to read Excel files and the xlrd.xldate_as_tuple() function to get the date components to feed into datetime.time().

However, xlrd.xldate_as_tuple() only returns seconds and not microseconds so that information is lost to pandas.

For example, say you have an Excel file like this (Number is the same as time but without a format):

Time            Number
0:17:51.000     0.012395833
0:17:51.200     0.012398148
0:17:51.400     0.012400463
0:17:51.600     0.012402778
0:17:52.800     0.012416667
0:17:53.000     0.012418981

Then, if you read the data with the following program:

import xlrd

workbook = xlrd.open_workbook('minutes.xls')
worksheet = workbook.sheet_by_name('Sheet1')

cell =  worksheet.cell(2,0)

# Print the A2 cell value as a number.
print cell.value

# Print the seconds part of the A2 cell value.
print (cell.value * (24*60*60)) % 60

# Print the xldate_as_tuple output.
print xlrd.xldate_as_tuple(cell.value, workbook.datemode)

You get the following output:

0.0123981481481
51.2
(0, 0, 0, 0, 17, 51)

So, the decimal part of the seconds is read (51.2) but not returned by xldate_as_tuple() and thus not available to pandas.

This is the documented behaviour of xldate_as_tuple() but you could submit a feature request or a pull request.

Update: I submitted a fix for this to xlrd.

来源：https://stackoverflow.com/questions/21004376/import-excel-time-into-pandas-with-decimal-seconds

标签

python

parsing

pandas

xlrd