spacy

how can i modify language model before applying patterns

亡梦爱人 提交于 2021-01-07 02:49:59
问题 I have this code : from spacy.matcher import Matcher,PhraseMatcher import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") matcher = Matcher(nlp.vocab,validate=True) patterns = [ [{'POS': 'QUALIF'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}], ] matcher.add("process_1", None, *patterns) texts= ["it is a beautiful and big apple"] for text in texts: doc = nlp(text) matches = matcher(doc) for _, start, end in matches: print(doc[start:end].text) So, I want to

how can i modify language model before applying patterns

…衆ロ難τιáo~ 提交于 2021-01-07 02:49:55
问题 I have this code : from spacy.matcher import Matcher,PhraseMatcher import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") matcher = Matcher(nlp.vocab,validate=True) patterns = [ [{'POS': 'QUALIF'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}], ] matcher.add("process_1", None, *patterns) texts= ["it is a beautiful and big apple"] for text in texts: doc = nlp(text) matches = matcher(doc) for _, start, end in matches: print(doc[start:end].text) So, I want to

Is it possible to find uncertainties of spaCy POS tags?

眉间皱痕 提交于 2021-01-05 09:01:16
问题 I am trying to build a non-English spell checker that relies on classification of sentences by spaCy, which allows my algorithm to then use the POS tags and the grammatical dependencies of the individual tokens to determine incorrect spelling (in my case more specifically: incorrect splits in Dutch compound words). However, spaCy appears to classify sentences incorrectly if they contain grammatical errors, for example classifying a noun as a verb, even though the classified word doesn't even

Repeating entity in replacing entity with their entity label using spacy

南楼画角 提交于 2021-01-01 09:26:08
问题 Code: import spacy nlp = spacy.load("en_core_web_md") #read txt file, each string on its own line with open("./try.txt","r") as f: texts = f.read().splitlines() #substitute entities with their TAGS docs = nlp.pipe(texts) out = [] for doc in docs: out_ = "" for tok in doc: text = tok.text if tok.ent_type_: text = tok.ent_type_ out_ += text + tok.whitespace_ out.append(out_) # write to file with open("./out_try.txt","w") as f: f.write("\n".join(out)) Contents of input file: Georgia recently

What are the supported Date and Time Formats in Spacy 2.0

安稳与你 提交于 2021-01-01 06:57:25
问题 I am using the following models in my application: en_core_web_sm xx_ent_wiki_sm I wanted to know the supported Date and Time formats that default Spacy model can extract. Python Version Used:3.6 spaCy Version Used: 2.0.x 回答1: The English models were trained on the OntoNotes 5 corpus, which supports the more extensive label scheme including DATE and TIME . The xx_ent_wiki_sm model was trained on a Wikipedia corpus with a more limited label scheme and only recognises PER , LOC , ORG and MISC

change beam_width in spacy NER

自古美人都是妖i 提交于 2021-01-01 04:59:10
问题 I would like to change the nlp.entity.cfg beam_width (by default it's 1) by 3. I tried nlp.entity.cfg.update({beam_width : 3}) but it's look like that the nlp thing is broken after this change. (If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1) I want to change it because the probability of NER will be more accurate in my case (it's my own model that I trained). I did the probas with a code found in github.spacy/issues with nlp

spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

时光总嘲笑我的痴心妄想 提交于 2020-12-30 12:07:36
问题 I was trying to create a executable file using pyinstaller. I got bellow issue while executing the issue. File "test_env2_live\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module exec(bytecode, module.__dict__) File "test_env2_live\controller\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site

spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

梦想的初衷 提交于 2020-12-30 12:01:06
问题 I was trying to create a executable file using pyinstaller. I got bellow issue while executing the issue. File "test_env2_live\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module exec(bytecode, module.__dict__) File "test_env2_live\controller\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site

Extract Main- and Subclauses from German Sentence with SpaCy

房东的猫 提交于 2020-12-12 06:28:18
问题 In German, how can I extract the main- and subclauses (aka "subordinate clauses", "dependent clauses") from a sentence with SpaCy? I know how to use SpaCy's tokenizer, part-of-speech tagging and dependency parser, but I cannot figure out how to represent the grammatical rules of German using the information SpaCy can extract. 回答1: The problem can be divided into two tasks: 1. Splitting the sentence in its constituting clauses and 2. Identifying which of the clauses is a main clause and which

How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)?

浪尽此生 提交于 2020-12-06 16:23:00
问题 I have a dataset in CoNLL NER format which is basically a TSV file with two fields. The first field contains tokens from some text - one token per line (each punctuation symbol is also considered a token there) and the second field contains named entity tags for tokens in BIO format. I would like to load this dataset into spaCy, infer new named entity tags for the text with my model and write these tags into the same TSV file as the new third column. All I know is that I can infer named