POS tagging in German

后端 未结 5 430
予麋鹿
予麋鹿 2020-12-12 21:28

I am using NLTK to extract nouns from a text-string starting with the following command:

tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string)         


        
5条回答
  •  执笔经年
    2020-12-12 22:16

    The Pattern library includes a function for parsing German sentences and the result includes the part-of-speech tags. The following is copied from their documentation:

    from pattern.de import parse, split
    s = parse('Die Katze liegt auf der Matte.')
    s = split(s)
    print s.sentences[0]
    
    >>>   Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
         'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')
    

    If you prefer the SSTS tag set you can set the optional parameter tagset="STTS".

    Update: Another option is spacy, there is a quick example in this blog article:

    import spacy
    
    nlp = spacy.load('de')
    doc = nlp(u'Ich bin ein Berliner.')
    
    # show universal pos tags
    print(' '.join('{word}/{tag}'.format(word=t.orth_, tag=t.pos_) for t in doc))
    # output: Ich/PRON bin/AUX ein/DET Berliner/NOUN ./PUNCT
    

提交回复
热议问题