spacy | 易学教程

Noun phrases with spacy

阅读更多关于 Noun phrases with spacy

问题 How can I extract noun phrases from text using spacy? I am not referring to part of speech tags. In the documentation I cannot find anything about noun phrases or regular parse trees. 回答1: If you want base NPs, i.e. NPs without coordination, prepositional phrases or relative clauses, you can use the noun_chunks iterator on the Doc and Span objects: >>> from spacy.en import English >>> nlp = English() >>> doc = nlp(u'The cat and the dog sleep in the basket near the door.') >>> for np in doc

Why does spaCy not preserve intra-word-hyphens during tokenization like Stanford CoreNLP does?

阅读更多关于 Why does spaCy not preserve intra-word-hyphens during tokenization like Stanford CoreNLP does?

问题 SpaCy Version: 2.0.11 Python Version: 3.6.5 OS: Ubuntu 16.04 My Sentence Samples: Marketing-Representative- won't die in car accident. or Out-of-box implementation Expected Tokens: ["Marketing-Representative", "-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out-of-box", "implementation"] SpaCy Tokens(Default Tokenizer): ["Marketing", "-", "Representative-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out", "-", "of", "-", "box", "implementation"] I tried creating custom

SpaCy OSError: Can't find model 'en'

阅读更多关于 SpaCy OSError: Can't find model 'en'

问题 even though I downloaded the model it cannot load it [jalal@goku entity-sentiment-analysis]$ which python /scratch/sjn/anaconda/bin/python [jalal@goku entity-sentiment-analysis]$ sudo python -m spacy download en [sudo] password for jalal: Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB) 100% |██

How to get the dependency tree with spaCy?

阅读更多关于 How to get the dependency tree with spaCy?

问题 I have been trying to find how to get the dependency tree with spaCy but I can't find anything on how to get the tree, only on how to navigate the tree. 回答1: In case someone wants to easily view the dependency tree produced by spacy, one solution would be to convert it to an nltk.tree.Tree and use the nltk.tree.Tree.pretty_print method. Here is an example: import spacy from nltk import Tree en_nlp = spacy.load('en') doc = en_nlp("The quick brown fox jumps over the lazy dog.") def to_nltk_tree

spacy词向量

阅读更多关于 spacy词向量

spaCy能够比较两个对象，并预测它们的相似程度。预测相似性对于构建推荐系统或标记重复项很有用。例如，您可以建议与当前正在查看的用户内容相似的用户内容，或者将支持凭单标记为与现有内容非常相似的副本。每个Doc、Span和Token都有一个.similarity()方法，它允许您将其与另一个对象进行比较，并确定相似度。当然，相似性总是主观的——“狗”和“猫”是否相似取决于你如何看待它。spaCy的相似模型通常假定一个相当通用的相似性定义。 tokens = nlp(u'dog cat banana') for token1 in tokens: for token2 in tokens: print(token1.similarity(token2)) 在这种情况下，模型的预测是很准确的。狗和猫非常相似，而香蕉却不是很相似。相同的标记显然是100%相似的(并不总是精确的1.0，因为向量数学和浮点数的不精确)。相似性是通过比较词向量或“词嵌入”来确定的，即一个词的多维意思表示。单词向量可以通过像word2vec这样的算法生成，通常是这样的: important note 为了使比较算法简洁和快速，spaCy的小模型(所有以sm结尾的包)都不使用单词向量，而且这些sm包只包含上下文相关的向量。这意味着您仍然可以使用similarity()方法来比较文档、span和token