spacy

is there a way with spaCy's NER to calculate metrics per entity type?

余生长醉 提交于 2019-12-03 06:56:38
is there a way in the NER model in spaCy to extract the metrics (precision, recall, f1 score) per entity type? Something that will look like this: precision recall f1-score support B-LOC 0.810 0.784 0.797 1084 I-LOC 0.690 0.637 0.662 325 B-MISC 0.731 0.569 0.640 339 I-MISC 0.699 0.589 0.639 557 B-ORG 0.807 0.832 0.820 1400 I-ORG 0.852 0.786 0.818 1104 B-PER 0.850 0.884 0.867 735 I-PER 0.893 0.943 0.917 634 avg / total 0.809 0.787 0.796 6178 taken from: http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/ Thank you! Nice question. First, we should clarify that spaCy uses the

Evaluation in a Spacy NER model

时光怂恿深爱的人放手 提交于 2019-12-03 06:14:38
I am trying to evaluate a trained NER Model created using spacy lib . Normally for these kind of problems you can use f1 score (a ratio between precision and recall). I could not find in the documentation an accuracy function for a trained NER model. I am not sure if it's correct but I am trying to do it with the following way(example) and using f1_score from sklearn : from sklearn.metrics import f1_score import spacy from spacy.gold import GoldParse nlp = spacy.load("en") #load NER model test_text = "my name is John" # text to test accuracy doc_to_test = nlp(test_text) # transform the text to

Spacy custom tokenizer to include only hyphen words as tokens using Infix regex

风流意气都作罢 提交于 2019-12-03 03:28:30
I want to include hyphen words for example: long-term, self-esteem, etc. as a single token in Spacy. After looking at some similar posts on Stackoverflow, Github , its documentation and elsewhere , I also wrote a custom tokenizer as below. import re from spacy.tokenizer import Tokenizer prefix_re = re.compile(r'''^[\[\("']''') suffix_re = re.compile(r'''[\]\)"']$''') infix_re = re.compile(r'''[.\,\?\:\;\...\‘\’\`\“\”\"\'~]''') def custom_tokenizer(nlp): return Tokenizer(nlp.vocab, prefix_search=prefix_re.search, suffix_search=suffix_re.search, infix_finditer=infix_re.finditer, token_match=None

Using PhraseMatcher in SpaCy to find multiple match types

匆匆过客 提交于 2019-12-03 03:11:41
The SpaCy documentation and samples show that the PhraseMatcher class is useful to match sequences of tokens in documents. One must provide a vocabulary of sequences that will be matched. In my application, I have documents that are collections of tokens and phrases. There are entities of different types. The data is remotely natural language (documents are rather set of keywords with semi-random order). I am trying to find matches of multiple types. For example: yellow boots for kids How can I find the matches for colors (e.g. yellow), for product types (e.g. boots) and for the age (e.g. kids

how to use spacy lemmatizer to get a word into basic form

痴心易碎 提交于 2019-12-03 02:56:52
I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words. Examples: 'words'=> 'word' 'did' => 'do' Thank you. damio Previous answer is convoluted and can't be edited, so here's a more conventional one. # make sure your downloaded the english model with "python -m spacy download en" import spacy nlp = spacy.load('en') doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.") for token in doc: print(token, token.lemma, token.lemma_) Output: Apples 6617 apples

Add/remove stop words with spacy

匿名 (未验证) 提交于 2019-12-03 02:49:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: What is the best way to add/remove stop words with spacy? I am using token.is_stop function and would like to make some custome changes to the set. I was looking at the doccumentation but could not find anything regarding of stop words. Thanks! 回答1: You can edit them before processing your text like this (see this post ): >>> import spacy >>> nlp = spacy.load("en") >>> nlp.vocab["the"].is_stop = False >>> nlp.vocab["definitelynotastopword"].is_stop = True >>> sentence = nlp("the word is definitelynotastopword") >>> sentence[0].is_stop False

ImportError: No module named 'spacy.en'

匿名 (未验证) 提交于 2019-12-03 02:15:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm working on a codebase that uses Spacy. I installed spacy using: sudo pip3 install spacy and then sudo python3 -m spacy download en At the end of this last command, I got a message: Linking successful /home/rayabhik/.local/lib/python3.5/site-packages/en_core_web_sm --> /home/rayabhik/.local/lib/python3.5/site-packages/spacy/data/en You can now load the model via spacy.load('en') Now, when I try running my code, on the line: from spacy.en import English it gives me the following error: ImportError: No module named 'spacy.en' I've looked on

Import error with spacy: “No module named en”

匿名 (未验证) 提交于 2019-12-03 01:12:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm having trouble using the Python spaCy library . It seems to be installed correctly but at from spacy.en import English I get the following import error: Traceback (most recent call last): File "spacy.py", line 1, in <module> from spacy.en import English File "/home/user/CmdData/spacy.py", line 1, in <module> from spacy.en import English ImportError: No module named en I'm not very familiar with Python but that's the standard import I saw online, and the library is installed: $ pip list | grep spacy spacy (0.99) EDIT I tested renaming the

Failed building wheel for spacy

匿名 (未验证) 提交于 2019-12-03 01:10:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to install spacy by running pip install spacy for python version 3.6.1 but continuously i'm getting errors like below,how to get rid of this issue? previously i was having cl.exe not found error, after that i added visual studio path in environment variables where cl.exe exists. Failed building wheel for spacy Running setup.py clean for spacy Running setup.py bdist_wheel for murmurhash ... error Complete output from command c:\users\sh00428701\appdata\local\programs\python\python36\python.exe -u -c "import setuptools, tokenize;_

What do spaCy's part-of-speech and dependency tags mean?

做~自己de王妃 提交于 2019-12-03 01:04:39
问题 spaCy tags up each of the Token s in a Document with a part of speech (in two different formats, one stored in the pos and pos_ properties of the Token and the other stored in the tag and tag_ properties) and a syntactic dependency to its .head token (stored in the dep and dep_ properties). Some of these tags are self-explanatory, even to somebody like me without a linguistics background: >>> import spacy >>> en_nlp = spacy.load('en') >>> document = en_nlp("I shot a man in Reno just to watch