Temporal Extraction (i.e. Extract date/time entities from free form text) - How?

人盡茶涼 提交于 2019-12-06 22:05:37

问题


Has anyone found a simple, but effective way to extract date references from text? I've done a fair amount of searching for temporal extraction tools, but there isn't a lot out there. There are a few white papers, but it seems to fall into a subset of the whole semantic web thingy but not given much attention.

I'm just looking for something that is 80% effective. There is no need to capture things like "the month after Jan 2009", but basic common dates entities would be nice.

I'm open to all suggestions, even fancy regex expressions.

Fire away!

(and thanks - Henry)


回答1:


  1. If the target temporal expressions in your data are only in limited format, use regular expression and iterative approach to refine your system

  2. Otherwise, use Stanford NLP toolkit, SUTime, which might be an over-kill but definitely meet your demands




回答2:


One way I have done this is to just look for anything that is 4 numbers and convert it to a number. If the number falls within the range of years you are interested in, you probably have a year you can use. If you are interested in any matching months and days you could check adjacent words to see if they are a month name or a number between 1 and 31. I am confident this would satisfy your 80% requirement.

Regex for years: [0-9]{4} - you will need to convert to a number and see if it's within the range of years you consider valid.

Regex for months: jan|january|feb|february ... etc for each month

Regex for days of the month: [0-9]{1,2} - you would need to convert to a number and see if it is 1-31




回答3:


I'm drawing a blank on how to find what to feed it, but this library will parse a wide range of dates and could be used as the "is this a real date" function. (Full disclosure, I'm the author of that lib)



来源:https://stackoverflow.com/questions/1134809/temporal-extraction-i-e-extract-date-time-entities-from-free-form-text-how

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!