parse multiple dates using dateutil

我与影子孤独终老i 提交于 2019-12-10 17:41:51

问题


I am trying to parse multiple dates from a string in Python with the help of this code,

from dateutil.parser import _timelex, parser
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
p = parser()
info = p.info
def timetoken(token):
  try:
    float(token)
    return True
  except ValueError:
    pass
  return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))

def timesplit(input_string):
  batch = []
  for token in _timelex(input_string):
    if timetoken(token):
      if info.jump(token):
        continue
      batch.append(token)
    else:
      if batch:
        yield " ".join(batch)
        batch = []
  if batch:
    yield " ".join(batch)

for item in timesplit(a):
  print "Found:", item
  print "Parsed:", p.parse(item)

and the codes is taking second half from the string as second date and giving me this error,

raise ValueError, "unknown string format"

ValueError: unknown string format

when i change 'second half' to 'third half' or 'forth half' then it is working all fine.

Can any one help me to parse this string ?


回答1:


Your parser couldn't handle the "second" found by timesplit,if you set the fuzzy param to be True, it doesn't break but nor does it produce anything meaningful.

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    print "Parsed:", p.parse(StringIO(item),fuzzy=True)

out:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Parsed: 2013-01-11 00:00:00
Found: 20 10 2012
Parsed: 2012-10-20 00:00:00

You have to fix the timesplitting or handle the errors:

opt1:

lose the info.hms from timetoken

opt2:

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    try:
        print "Parsed:", p.parse(StringIO(item))
    except ValueError:
        print 'Not Parsed!'

out:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Not Parsed!
Parsed: Found: 20 10 2012
Parsed: 2012-10-20 00:00:00



回答2:


If you need only dates, could extract it with regex and works with dates.

a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "

import re
pattern = re.compile('\d{2}/\d{2}/\d{4}')
pattern.findall(a)
['12/10/2012', '20/10/2012']


来源:https://stackoverflow.com/questions/14279058/parse-multiple-dates-using-dateutil

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!