As part of a larger personal project I\'m working on, I\'m attempting to separate out inline dates from a variety of text sources.
For example, I have a large list o
If you can identify the segments that actually contain the date information, parsing them can be fairly simple with parsedatetime. There are a few things to consider though namely that your dates don't have years and you should pick a locale.
>>> import parsedatetime
>>> p = parsedatetime.Calendar()
>>> p.parse("December 15th")
((2013, 12, 15, 0, 13, 30, 4, 319, 0), 1)
>>> p.parse("9/18 11:59 pm")
((2014, 9, 18, 23, 59, 0, 4, 319, 0), 3)
>>> # It chooses 2014 since that's the *next* occurence of 9/18
It doesn't always work perfectly when you have extraneous text.
>>> p.parse("9/19 LAB: Serial encoding")
((2014, 9, 19, 0, 15, 30, 4, 319, 0), 1)
>>> p.parse("9/19 LAB: Serial encoding (Section 2.2)")
((2014, 2, 2, 0, 15, 32, 4, 319, 0), 1)
Honestly, this seems like the kind of problem that would be simple enough to parse for particular formats and pick the most likely out of each sentence. Beyond that, it would be a decent machine learning problem.