问题
I'm experimenting with NTLK. My question is if the library can detect the gender of a noun in German. I want to receive this information in order to determine if a text is written gender neutral. See here for more information: https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender
The underlying code categorizes my sentence, but I can't see any information about the gender of "Mitarbeiter". My code so far:
sentence = """Der Mitarbeiter geht."""
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
I haven't found any tools or scripts which accomplish this so far. Maybe there's also a better solution for my task.
回答1:
I don't believe NLTK can do that out of the box for German. However, there are freely available morphological taggers for German which can do that for you, for example RFTagger:
http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/
It gives output like this:
Das PRO.Dem.Subst.-3.Nom.Sg.Neut
ist VFIN.Sein.3.Sg.Pres.Ind
ein ART.Indef.Nom.Sg.Masc
Testsatz N.Reg.Nom.Sg.Masc
. SYM.Pun.Sent
However it is not in Python, so you would have to call it using subprocess. Another option would be to obtain a corpus with nouns tagged for German gender, such as the Tiger corpus:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html
and train NLTK to recognize the genders, but I would expect RFTagger is a quicker/more accurate solution.
回答2:
Pattern purports to predict German noun gender with ~75% accuracy:
>>> from pattern.de import gender, MALE, FEMALE, NEUTRAL
>>> print gender('Katze')
FEMALE
Unfortunately it's only available in Python 2.x.
来源:https://stackoverflow.com/questions/42517201/get-gender-from-noun-using-nltk-with-german-corpora