spacy | 易学教程

how can i modify language model before applying patterns

阅读更多关于 how can i modify language model before applying patterns

问题 I have this code : from spacy.matcher import Matcher,PhraseMatcher import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") matcher = Matcher(nlp.vocab,validate=True) patterns = [ [{'POS': 'QUALIF'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}], ] matcher.add("process_1", None, *patterns) texts= ["it is a beautiful and big apple"] for text in texts: doc = nlp(text) matches = matcher(doc) for _, start, end in matches: print(doc[start:end].text) So, I want to

how can i modify language model before applying patterns

阅读更多关于 how can i modify language model before applying patterns

Is it possible to find uncertainties of spaCy POS tags?

阅读更多关于 Is it possible to find uncertainties of spaCy POS tags?

问题 I am trying to build a non-English spell checker that relies on classification of sentences by spaCy, which allows my algorithm to then use the POS tags and the grammatical dependencies of the individual tokens to determine incorrect spelling (in my case more specifically: incorrect splits in Dutch compound words). However, spaCy appears to classify sentences incorrectly if they contain grammatical errors, for example classifying a noun as a verb, even though the classified word doesn't even

Repeating entity in replacing entity with their entity label using spacy

阅读更多关于 Repeating entity in replacing entity with their entity label using spacy

问题 Code: import spacy nlp = spacy.load("en_core_web_md") #read txt file, each string on its own line with open("./try.txt","r") as f: texts = f.read().splitlines() #substitute entities with their TAGS docs = nlp.pipe(texts) out = [] for doc in docs: out_ = "" for tok in doc: text = tok.text if tok.ent_type_: text = tok.ent_type_ out_ += text + tok.whitespace_ out.append(out_) # write to file with open("./out_try.txt","w") as f: f.write("\n".join(out)) Contents of input file: Georgia recently

What are the supported Date and Time Formats in Spacy 2.0

阅读更多关于 What are the supported Date and Time Formats in Spacy 2.0

问题 I am using the following models in my application: en_core_web_sm xx_ent_wiki_sm I wanted to know the supported Date and Time formats that default Spacy model can extract. Python Version Used:3.6 spaCy Version Used: 2.0.x 回答1: The English models were trained on the OntoNotes 5 corpus, which supports the more extensive label scheme including DATE and TIME . The xx_ent_wiki_sm model was trained on a Wikipedia corpus with a more limited label scheme and only recognises PER , LOC , ORG and MISC

change beam_width in spacy NER

阅读更多关于 change beam_width in spacy NER

问题 I would like to change the nlp.entity.cfg beam_width (by default it's 1) by 3. I tried nlp.entity.cfg.update({beam_width : 3}) but it's look like that the nlp thing is broken after this change. (If I do a nlp(str), it will give me a dict instead of a spacy.tokens.doc.Doc like usual if I put beam_width : 1) I want to change it because the probability of NER will be more accurate in my case (it's my own model that I trained). I did the probas with a code found in github.spacy/issues with nlp

spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

阅读更多关于 spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

问题 I was trying to create a executable file using pyinstaller. I got bellow issue while executing the issue. File "test_env2_live\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 627, in exec_module exec(bytecode, module.__dict__) File "test_env2_live\controller\main.py", line 2, in <module> File "C:\Users\rajesh.das\AppData\Local\Continuum\anaconda3\envs\test_env2\lib\site

spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

阅读更多关于 spacy 2.2.3 FileNotFoundError: [Errno 2] No such file or directory: 'thinc\\neural\\_custom_kernels.cu' in pyinstaller

Extract Main- and Subclauses from German Sentence with SpaCy

阅读更多关于 Extract Main- and Subclauses from German Sentence with SpaCy

问题 In German, how can I extract the main- and subclauses (aka "subordinate clauses", "dependent clauses") from a sentence with SpaCy? I know how to use SpaCy's tokenizer, part-of-speech tagging and dependency parser, but I cannot figure out how to represent the grammatical rules of German using the information SpaCy can extract. 回答1: The problem can be divided into two tasks: 1. Splitting the sentence in its constituting clauses and 2. Identifying which of the clauses is a main clause and which

How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)?

阅读更多关于 How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)?

问题 I have a dataset in CoNLL NER format which is basically a TSV file with two fields. The first field contains tokens from some text - one token per line (each punctuation symbol is also considered a token there) and the second field contains named entity tags for tokens in BIO format. I would like to load this dataset into spaCy, infer new named entity tags for the text with my model and write these tags into the same TSV file as the new third column. All I know is that I can infer named