Best way to identify and extract dates from text Python?

前端 未结 7 862
孤城傲影
孤城傲影 2020-12-13 06:10

As part of a larger personal project I\'m working on, I\'m attempting to separate out inline dates from a variety of text sources.

For example, I have a large list o

7条回答
  •  情歌与酒
    2020-12-13 07:09

    If you can identify the segments that actually contain the date information, parsing them can be fairly simple with parsedatetime. There are a few things to consider though namely that your dates don't have years and you should pick a locale.

    >>> import parsedatetime
    >>> p = parsedatetime.Calendar()
    >>> p.parse("December 15th")
    ((2013, 12, 15, 0, 13, 30, 4, 319, 0), 1)
    >>> p.parse("9/18 11:59 pm")
    ((2014, 9, 18, 23, 59, 0, 4, 319, 0), 3)
    >>> # It chooses 2014 since that's the *next* occurence of 9/18
    

    It doesn't always work perfectly when you have extraneous text.

    >>> p.parse("9/19 LAB: Serial encoding")
    ((2014, 9, 19, 0, 15, 30, 4, 319, 0), 1)
    >>> p.parse("9/19 LAB: Serial encoding (Section 2.2)")
    ((2014, 2, 2, 0, 15, 32, 4, 319, 0), 1)
    

    Honestly, this seems like the kind of problem that would be simple enough to parse for particular formats and pick the most likely out of each sentence. Beyond that, it would be a decent machine learning problem.

提交回复
热议问题