spacy

Noun phrases with spacy

99封情书 提交于 2019-11-27 00:45:25
问题 How can I extract noun phrases from text using spacy? I am not referring to part of speech tags. In the documentation I cannot find anything about noun phrases or regular parse trees. 回答1: If you want base NPs, i.e. NPs without coordination, prepositional phrases or relative clauses, you can use the noun_chunks iterator on the Doc and Span objects: >>> from spacy.en import English >>> nlp = English() >>> doc = nlp(u'The cat and the dog sleep in the basket near the door.') >>> for np in doc

Why does spaCy not preserve intra-word-hyphens during tokenization like Stanford CoreNLP does?

送分小仙女□ 提交于 2019-11-26 23:42:19
问题 SpaCy Version: 2.0.11 Python Version: 3.6.5 OS: Ubuntu 16.04 My Sentence Samples: Marketing-Representative- won't die in car accident. or Out-of-box implementation Expected Tokens: ["Marketing-Representative", "-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out-of-box", "implementation"] SpaCy Tokens(Default Tokenizer): ["Marketing", "-", "Representative-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out", "-", "of", "-", "box", "implementation"] I tried creating custom

SpaCy OSError: Can't find model 'en'

人盡茶涼 提交于 2019-11-26 20:23:43
问题 even though I downloaded the model it cannot load it [jalal@goku entity-sentiment-analysis]$ which python /scratch/sjn/anaconda/bin/python [jalal@goku entity-sentiment-analysis]$ sudo python -m spacy download en [sudo] password for jalal: Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB) 100% |██

How to get the dependency tree with spaCy?

柔情痞子 提交于 2019-11-26 18:59:11
问题 I have been trying to find how to get the dependency tree with spaCy but I can't find anything on how to get the tree, only on how to navigate the tree. 回答1: In case someone wants to easily view the dependency tree produced by spacy, one solution would be to convert it to an nltk.tree.Tree and use the nltk.tree.Tree.pretty_print method. Here is an example: import spacy from nltk import Tree en_nlp = spacy.load('en') doc = en_nlp("The quick brown fox jumps over the lazy dog.") def to_nltk_tree

spacy词向量

青春壹個敷衍的年華 提交于 2019-11-26 12:30:01
spaCy能够比较两个对象,并预测它们的相似程度。 预测相似性对于构建推荐系统或标记重复项很有用。 例如,您可以建议与当前正在查看的用户内容相似的用户内容,或者将支持凭单标记为与现有内容非常相似的副本。 每个Doc、Span和Token都有一个.similarity()方法,它允许您将其与另一个对象进行比较,并确定相似度。当然,相似性总是主观的——“狗”和“猫”是否相似取决于你如何看待它。spaCy的相似模型通常假定一个相当通用的相似性定义。 tokens = nlp(u'dog cat banana') for token1 in tokens: for token2 in tokens: print(token1.similarity(token2)) 在这种情况下,模型的预测是很准确的。狗和猫非常相似,而香蕉却不是很相似。相同的标记显然是100%相似的(并不总是精确的1.0,因为向量数学和浮点数的不精确)。 相似性是通过比较词向量或“词嵌入”来确定的,即一个词的多维意思表示。单词向量可以通过像word2vec这样的算法生成,通常是这样的: important note 为了使比较算法简洁和快速,spaCy的小模型(所有以sm结尾的包)都不使用单词向量,而且这些sm包只包含上下文相关的向量。这意味着您仍然可以使用similarity()方法来比较文档、span和token