Python datetime.strptime() Eating lots of CPU Time

后端未结

关注

 4  1177

I have some log parsing code that needs to turn a timestamp into a datetime object. I am using datetime.strptime but this function is using a lot of cputime according to cPr

相关标签:

4条回答

南笙

2021-01-04 18:43

If those are fixed width formats then there is no need to parse the line - you can use slicing and a dictionary lookup to get the fields directly.

month_abbreviations = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4,
                       'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8,
                       'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
year = int(line[7:11])
month = month_abbreviations[line[3:6]]
day = int(line[0:2])
hour = int(line[12:14])
minute = int(line[15:17])
second = int(line[18:20])
new_entry['time'] = datetime.datetime(year, month, day, hour, minute, second)

Testing in the manner shown by Glenn Maynard shows this to be about 3 times faster.

0 讨论(0)

名媛妹妹

2021-01-04 18:46

It seems that using strptime() on a Windows platform uses a Python implementation (_strptime.py in the Lib directory). and not a C one. It might be quicker to process the string yourself.

from datetime import datetime
import timeit

def f():
    datetime.strptime ("2010-11-01", "%Y-%m-%d")

n = 100000
print "%.6f" % (timeit.timeit(f, number=n)/n)

returns 0.000049 on my system, whereas

from datetime import date
import timeit

def f():
    parts = [int (x) for x in "2010-11-01".split ("-")]
    return date (parts[0], parts[1], parts[2])    

n = 100000
print "%.6f" % (timeit.timeit(f, number=n)/n)

returns 0.000009

0 讨论(0)

情深已故

2021-01-04 18:46
Most recent answer: if moving to a straight strptime() has not improved the running time, then my suspicion is that there is actually no problem here: you have simply written a program, one of whose main purposes in life is to call strptime() very many times, and you have written it well enough — with so little other stuff that it does — that the strptime() calls are quite properly being allowed to dominate the runtime. I think you could count this as a success rather than a failure, unless you find that (a) some Unicode or LANG setting is making strptime() do extra work, or (b) you are calling it more often than you need to. Try, of course, to call it only once for each date to be parsed. :-)

Follow-up answer after seeing example date string: Wait! Hold on! Why are you parsing the line instead of just using a formatting string like:
```
"%d/%b/%Y:%H:%M:%S"
```
Original off-the-cuff-answer: If the month were a integer you could do something like this:
```
new_entry['time'] = datetime.datetime(
    int(parsed_line['year']),
    int(parsed_line['month']),
    int(parsed_line['day']),
    int(parsed_line['hour']),
    int(parsed_line['minute']),
    int(parsed_line['second'])
)
```
and avoid creating a big string just to make strptime() split it back apart again. I wonder if there is a way to access the month-name logic directly to do that one textual conversion?
0 讨论(0)
发布评论:

提交评论
- 加载中...

一向

2021-01-04 18:53

What's a "lot of time"? strptime is taking about 30 microseconds here:

from datetime import datetime
import timeit
def f():
    datetime.strptime("01/Nov/2010:07:49:33", "%d/%b/%Y:%H:%M:%S")
n = 100000
print "%.6f" % (timeit.timeit(f, number=n)/n)

prints 0.000031.

0 讨论(0)