finding the POS of the root of a noun_chunk with spacy

人盡茶涼 提交于 2020-06-27 06:06:29

问题


When using spacy you can easily loop across the noun_phrases of a text as follows:

S='This is an example sentence that should include several parts and also make clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)

[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']

You can also get the "root" of the noun chunk:

[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentence', 'parts', 'Processing']

How can I get the POS of every of those words (even if looks like the root of a noun_phrase is always a noun), and how can I get the lemma, the shape and the word in singular of that particular word.

Is that even possible?

thx.


回答1:


Each chunk.root is a Token where you can get different attributes including lemma_ and pos_ (or tag_ if you prefer the PennTreekbak POS tags).

import spacy
S='This is an example sentence that should include several parts and also make ' \
  'clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
for chunk in doc.noun_chunks:
    print('%-12s %-6s  %s' % (chunk.root.text, chunk.root.pos_, chunk.root.lemma_))

sentence     NOUN    sentence
parts        NOUN    part
Processing   NOUN    processing

BTW... In this sentence "processing" is a noun so the lemma of it is "processing", not "process" which is the lemma of the verb "processing".



来源:https://stackoverflow.com/questions/62272958/finding-the-pos-of-the-root-of-a-noun-chunk-with-spacy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!