发表新帖

发表新帖

Best way to identify and extract dates from text Python?

前端未结

关注

 7  863

孤城傲影 2020-12-13 06:10

As part of a larger personal project I\'m working on, I\'m attempting to separate out inline dates from a variety of text sources.

For example, I have a large list o

7条回答

情歌与酒 (楼主)

2020-12-13 07:09
If you can identify the segments that actually contain the date information, parsing them can be fairly simple with parsedatetime. There are a few things to consider though namely that your dates don't have years and you should pick a locale.
```
>>> import parsedatetime
>>> p = parsedatetime.Calendar()
>>> p.parse("December 15th")
((2013, 12, 15, 0, 13, 30, 4, 319, 0), 1)
>>> p.parse("9/18 11:59 pm")
((2014, 9, 18, 23, 59, 0, 4, 319, 0), 3)
>>> # It chooses 2014 since that's the *next* occurence of 9/18
```
It doesn't always work perfectly when you have extraneous text.
```
>>> p.parse("9/19 LAB: Serial encoding")
((2014, 9, 19, 0, 15, 30, 4, 319, 0), 1)
>>> p.parse("9/19 LAB: Serial encoding (Section 2.2)")
((2014, 2, 2, 0, 15, 32, 4, 319, 0), 1)
```
Honestly, this seems like the kind of problem that would be simple enough to parse for particular formats and pick the most likely out of each sentence. Beyond that, it would be a decent machine learning problem.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题