Train model using Named entity

℡╲_俬逩灬. 提交于 2019-12-01 00:22:34

The NERClassifier* is word level, that is, it labels words, not phrases. Given that, the classifier seems to be performing fine. If you want, you can hyphenate words that form phrases. So in your labeled examples and in your test examples, you would make "Land Cruiser" to "Land_Cruiser".

I believe you should also put examples of 0 entities in your trainFile. As you gave it, the trainFile is just too simple for the learning to be done, it needs both 0 and PERSON examples so it doesn't annotate everything as PERSON. You're not teaching it about your not-of-interest entities. Say, like this:

Toyota  PERS
of    0
Portfolio    0
49    0

and so on.

Also, for phrase-level recognition you should look into regexner, where you can have patterns (patterns are good for us). I'm working on this with the API and I have the following code:

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner");
props.put("regexner.mapping", customLocationFilename);

with the following customLocationFileName:

Make Believe Town   figure of speech    ORGANIZATION
( /Hello/ [{ ner:PERSON }]+ )   salut   PERSON
Bachelor of (Arts|Laws|Science|Engineering) DEGREE
( /University/ /of/ [{ ner:LOCATION }] )    SCHOOL

and text: Hello Mary Keller was born on 4th of July and took a Bachelor of Science. Partial invoice (€100,000, so roughly 40%) for the consignment C27655 we shipped on 15th August to University of London from the Make Believe Town depot. INV2345 is for the balance.. Customer contact (Sigourney Weaver) says they will pay this on the usual credit terms (30 days).

The output I get

Hello Mary Keller is a salut
4th of July is a DATE
Bachelor of Science is a DEGREE
$ 100,000 is a MONEY
40 % is a PERCENT
15th August is a DATE
University of London is a ORGANIZATION
Make Believe Town is a figure of speech
Sigourney Weaver is a PERSON
30 days is a DURATION

For more info on how to do this you can look at the example that got me going.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!