how to train a french NER based on stanford-nlp Conditional Random Fields model?

痞子三分冷 提交于 2019-11-29 20:04:17

问题


I discovered the tools of stanford-NLP and found it really interesting. I'm a french dataminer / datascientist, fond of text analysis and would love to use your tools, but the NER being unavailable in french is quite puzzling to me.

I would love to make my own french NER, perhaps even provide it as a contribution to the package if it is considered worthy, so... could you brief me on the requirements to train a CRF for french NER based on the stanford coreNLP ?

Thank you.


回答1:


NB: I am not a developper of the Stanford tools, nor a NLP expert. Just a lambda user that also needed such informations at some point. Also note that part of the information given below are from the official FAQ: http://nlp.stanford.edu/software/crf-faq.shtml#a

Here are the steps I followed to train my own NER:

  1. Install java8
  2. Create a train/test sample. It must take the form of .tsv files with the following format:

      Venez    O
      découvrir    O
      lundi    DAY
      le    O
      nouvel    O
      espace    O
      de    O
      vente    O
      ODHOJS    ORGANISATION
    

    Depending on the original format of your text, you can create this sample with SQL statement or other NLP tools. The labelling is the most complicated part as I don't know other ways to proceed than to do it by hand.

  3. Train the model with this command:

    java -cp "stanford-ner.jar:lib/*" -mx4g edu.stanford.nlp.ie.crf.CRFClassifier -prop prop.txt
    

    where prop.txt is also described here.

    This should create a new .jar containing the newly trained model.

  4. Test the model performances:

    java -cp "stanford-ner.jar:lib/*" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -testFile test.tsv > test.res
    

    The input test.tsv has the same format than the train.tsv file. The output in test.res has an extra column containing the NER predicted class. The last lines also show the summary in terms of precision, recall and F1.

  5. Finally, you can use your NER on real data:

    java -cp "stanford-ner.jar:lib/*" -mx5g edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz  -textFile test.txt -outputFormat inlineXML > test.res
    

Hope it helps.



来源:https://stackoverflow.com/questions/37852084/how-to-train-a-french-ner-based-on-stanford-nlp-conditional-random-fields-model

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!