Is there a multilingual temporal expression tagger that can run on Hadoop?
问题 I need to extract dates from lots of text. The more languages the better; English,Spanish, and Portuguese at a minimum. Does such a tool exist? In Java and Mavenized? Here's what I've found: http://code.google.com/p/heideltime/ many languages and an impressive online demo, but requires some odd external dependencies that I suspect will make cluster deployment hard/impossible http://nlp.stanford.edu/software/sutime.shtml Well documented, but English only. Easy to train? http://natty