问题
Can I use spacy in python to find NP with specific neighbors? I want Noun phrases from my text that has verb before and after it.
回答1:
- You can merge the noun phrases ( so that they do not get tokenized seperately).
Analyse the dependency parse tree, and see the POS of neighbouring tokens.
>>> import spacy >>> nlp = spacy.load('en') >>> sent = u'run python program run, to make this work' >>> parsed = nlp(sent) >>> list(parsed.noun_chunks) [python program] >>> for noun_phrase in list(parsed.noun_chunks): ... noun_phrase.merge(noun_phrase.root.tag_, noun_phrase.root.lemma_, noun_phrase.root.ent_type_) ... python program >>> [(token.text,token.pos_) for token in parsed] [(u'run', u'VERB'), (u'python program', u'NOUN'), (u'run', u'VERB'), (u',', u'PUNCT'), (u'to', u'PART'), (u'make', u'VERB'), (u'this', u'DET'), (u'work', u'NOUN')]
By analysing the POS of adjacent tokens, you can get your desired noun phrases.
- A better approach would be to analyse the dependency parse tree, and see the lefts and rights of the noun phrase, so that even if there is a punctuation or other POS tag between the noun phrase and verb, you can increase your search coverage
回答2:
From https://spacy.io/usage/linguistic-features#dependency-parse
You can use Noun chunks
.
Noun chunks are "base noun phrases" – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, "the lavish green grass" or "the world’s largest tech fund". To get the noun chunks in a document, simply iterate over Doc.noun_chunks
.
In:
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"Autonomous cars shift insurance liability toward manufacturers")
for chunk in doc.noun_chunks:
print(chunk.text)
Out:
Autonomous cars
insurance liability
manufacturers
来源:https://stackoverflow.com/questions/44661200/spacy-to-extract-specific-noun-phrase