NLTK for Named Entity Recognition

前端 未结 3 697
南方客
南方客 2021-01-30 11:33

I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out:

3条回答
  •  梦如初夏
    2021-01-30 11:53

    The default NE chunker in nltk is a maximum entropy chunker trained on the ACE corpus (http://catalog.ldc.upenn.edu/LDC2005T09). It has not been trained to recognise dates and times, so you need to train your own classifier if you want to do that.

    Have a look at http://mattshomepage.com/articles/2016/May/23/nltk_nec/, the whole process is explained very well.

    Also, there is a module called timex in nltk_contrib which might help you with your needs. https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/timex.py

提交回复
热议问题