Patterns with multi-terms entries in the IN attribute

我的梦境 提交于 2020-06-01 05:36:10

问题


I am extending a spaCy model using rules. While looking through the documentation, I noticed the IN attribute, which is used to map patterns to a dictionary of properties. This is great however it only works on single tokens.

For example, this pattern: {"label":"EXAMPLE","pattern":[{"LOWER": {"IN": ["such as", "like", "for example"]}}]} will only work with the term like but not the others.

What is the best way to achieve the same result for multi-terms attributes?


回答1:


It depends on how complicated the intended patterns are, but the PhraseMatcher can handle similar cases as above using the attribute LOWER:

import spacy
from spacy.matcher import PhraseMatcher

nlp = spacy.blank("en")
pmatcher = PhraseMatcher(nlp.vocab, attr="LOWER")
phrases = ["such as", "like", "for example"]
pmatcher.add("EXAMPLE", [nlp(x) for x in phrases])
assert pmatcher(nlp("Things Such As Books")) == [(15373972490796046842, 1, 3)]


来源:https://stackoverflow.com/questions/61975312/patterns-with-multi-terms-entries-in-the-in-attribute

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!