How to identify date from a string in Java

纵然是瞬间 提交于 2020-12-27 17:17:09

问题


Recently I am being challenged by quite an "easy" problem. Suppose that there is sentences (saved in a String), and I need to find out if there is any date in this String. The challenges is that the date can be in a lot of different formats. Some examples are shown in the list:

  • June 12, 1956
  • London, 21st October 2014
  • 13 October 1999
  • 01/11/2003

Worth mentioning that these are contained in one string. So as an example it can be like:

String s = "This event took place on 13 October 1999.";

My question in this case would be how can I detect that there is a date in this string. My first approach was to search for the word "event", and then try to localize the date. But with more and more possible formats of the date this solution is not very beautiful. The second solution that I tried is to create a list for months and search. This had good results but still misses the cases when the date is expressed all in digits.

One solution which I have not tried till now is to design regular expressions and try to find a match in the string. Not sure how much this solution might decrease the performance.

What could be a good solution that I should probably consider? Did anybody face a similar problem before and what solutions did you find?

One thing is for sure that there are no time, so the only interesting part is the date.


回答1:


Using the natty.joestelmach.com library

Natty is a natural language date parser written in Java. Given a date expression, natty will apply standard language recognition and translation techniques to produce a list of corresponding dates with optional parse and syntax information.

import com.joestelmach.natty.*;

List<Date> dates =new Parser().parse("Start date 11/30/2013 , end date Friday, Sept. 7, 2013").get(0).getDates();
        System.out.println(dates.get(0));
        System.out.println(dates.get(1));

//output:
//Sat Nov 30 11:14:30 BDT 2013
//Sat Sep 07 11:14:30 BDT 2013



回答2:


You are after Named Entity Recognition. I'd start with Stanford NLP. The 7 class model includes date, but the online demo struggles and misses the "13". :(

Natty mentioned above gives a better answer.




回答3:


If it's only one String you could use the Regular Expression as you mentioned. Having to find the different date format expressions. Here are some examples: Regular Expressions - dates

In case it's a document or a big text, you will need a parser. You could use a Lexical analysis approach.

Depending on the project using an external library as mentioned in some answers might be a good idea. Sometimes it's not an option.




回答4:


I've done this before with good precision and recall. You'll need GATE and its ANNIE plugin.

  1. Use GATE UI tool to create a .GAPP file that will contain your processing resources.

  2. Use the .GAPP file to use the extracted Date annotation set.

Step 2 can be done as follows:

Corpus corpus = Factory.newCorpus("Gate Corpus");
Document gateDoc = Factory.newDocument("This event took place on 13 October 1999.");
corpus.add(gateDoc);
File pluginsHome = Gate.getPluginsHome();
File ANNIEPlugin = new File(pluginsHome, "ANNIE");
File AnnieGapp = new File(ANNIEPlugin, "Test.gapp");
AnnieController =(CorpusController) PersistenceManager.loadObjectFromFile(AnnieGapp);
AnnieController.setCorpus(corpus);
AnnieController.execute();

Later you can see the extracted annotations like this:

AnnotationSetImpl ann = (AnnotationSetImpl) gateDoc.getAnnotations();
System.out.println("Found annotations of the following types: "+ gateDoc.getAnnotations().getAllTypes());

I'm sure you can do it easily with the inbuilt annotation set Date. It is also very enhancable.

To enhance the annotation set Date create a lenient annotation rule in JAPE say 'DateEnhanced' from inbuilt ANNIE annotation Date to include certain kinds of dates like "9/11" and use a Chaining of Java regex on R.H.S. of the 'DateEnhanced' annotations JAPE RULE, to filter some unwanted outputs (if any).



来源:https://stackoverflow.com/questions/33547179/how-to-identify-date-from-a-string-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!