问题
I have followed in the documentation from nltk book (chapter 6 and 7) and other ideas to train my own model for named entity recognition. After building a feature function and ClassifierBasedTagger like this:
class NamedEntityChunker(ChunkParserI):
def __init__(self, train_sents, feature_detector=features, **kwargs):
assert isinstance(train_sents, Iterable)
tagged_sents = [[((w,t),c) for (w,t,c) in
tree2conlltags(sent)]
for sent in train_sents]
#other possible option: self.feature_detector = features
self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)
def parse(self, tagged_sent):
chunks = self.tagger.tag(tagged_sent)
iob_triplets = [(w, t, c) for ((w, t), c) in chunks]
# Transform the list of triplets to nltk.Tree format
return conlltags2tree(iob_triplets)
I am having problems when caling the classifiertagger from another script where I load my traning and test data. I call the classifier using a portion from my training data for testing purpose from:
chunker = NamedEntityChunker(training_samples[:500])
No matter what I change in my classifier I keept getting the error:
self.tagger = ClassifierBasedTagger(tagged_sents, feature_detector=feature_detector, **kwargs)
TypeError: __init__() got multiple values for argument 'feature_detector'
What am I doing wrong here, I supossed the feature function is working fine and I don't have to pass anything else when calling NamedEntityChunker().
my second question, is there a way to save the model being trained and reuse it lataer, how can I approach this? This is a follow up of my last question on training data
Thanks for any advise
回答1:
I finally realised what I was missing: when defining BasedTagger you have to pass an argument for "tagged_sents", like this:
#self.tagger = ClassifierBasedTagger(train=train_sents, feature_detector=features, **kwargs)
now when I call the chunker NamedEntityChunker() everything is working.
回答2:
Are you sure your code is exactly as you report it? This should not produce the problem you report; but you will get this behavior if you pass a keyword argument that is also a key in the kwargs variable:
>>> def test(a, b): # In fact the signature of `test` is irrelevant
pass
>>> args = { 'a'=1, 'b'=2 }
>>> test(a=0, **args)
TypeError: test() got multiple values for keyword argument 'a'
So, figure out where the problem arises and fix it. Have your methods print out their arguments to help you debug the problem.
来源:https://stackoverflow.com/questions/43662829/how-to-call-the-classifierbasedtagger-in-nltk