Get gender from noun using NLTK with German corpora

£可爱£侵袭症+ 提交于 2020-01-03 17:11:10

问题


I'm experimenting with NTLK. My question is if the library can detect the gender of a noun in German. I want to receive this information in order to determine if a text is written gender neutral. See here for more information: https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender

The underlying code categorizes my sentence, but I can't see any information about the gender of "Mitarbeiter". My code so far:

sentence = """Der Mitarbeiter geht."""
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]

I haven't found any tools or scripts which accomplish this so far. Maybe there's also a better solution for my task.


回答1:


I don't believe NLTK can do that out of the box for German. However, there are freely available morphological taggers for German which can do that for you, for example RFTagger:

http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/

It gives output like this:

Das     PRO.Dem.Subst.-3.Nom.Sg.Neut 
ist     VFIN.Sein.3.Sg.Pres.Ind 
ein     ART.Indef.Nom.Sg.Masc 
Testsatz    N.Reg.Nom.Sg.Masc 
.   SYM.Pun.Sent 

However it is not in Python, so you would have to call it using subprocess. Another option would be to obtain a corpus with nouns tagged for German gender, such as the Tiger corpus:

http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html

and train NLTK to recognize the genders, but I would expect RFTagger is a quicker/more accurate solution.




回答2:


Pattern purports to predict German noun gender with ~75% accuracy:

>>> from pattern.de import gender, MALE, FEMALE, NEUTRAL
>>> print gender('Katze')

FEMALE

Unfortunately it's only available in Python 2.x.



来源:https://stackoverflow.com/questions/42517201/get-gender-from-noun-using-nltk-with-german-corpora

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!