I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming.
The problem is, I don\'t no how
I think it is quite well documented in Stanford NER faq section http://nlp.stanford.edu/software/crf-faq.shtml#a.
Here are the steps:
map = word=0,myfeature=1,answer=2
In src\edu\stanford\nlp\sequences\SeqClassifierFlags.java
Add a flag stating that you want to use your new feature, let's call it useMyFeature
Below public boolean useLabelSource = false
, Add
public boolean useMyFeature= true;
In same file in setProperties(Properties props, boolean printProps)
method after
else if (key.equalsIgnoreCase("useTrainLexicon")) { ..}
tell tool, if this flag is on/off for you
else if (key.equalsIgnoreCase("useMyFeature")) {
useMyFeature= Boolean.parseBoolean(val);
}
In src/edu/stanford/nlp/ling/CoreAnnotations.java
, add following
section
public static class myfeature implements CoreAnnotation {
public Class getType() {
return String.class;
}
}
In src/edu/stanford/nlp/ling/AnnotationLookup.java
in
public enumKeyLookup{..}
in bottom add
MY_TAG(CoreAnnotations.myfeature.class,"myfeature")
In src\edu\stanford\nlp\ie\NERFeatureFactory.java
, depending on the
"type" of feature it is, add in
protected Collection featuresC(PaddedList cInfo, int loc)
if(flags.useRahulPOSTAGS){
featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
}
Debugging: In addition to this, there are methods which dump the features on file, use them to see how things are getting done under hood. Also, I think you would have to spend some time with debugger too :P