spacy | 易学教程

spacy similarity method doesn't not work correctly

阅读更多关于 spacy similarity method doesn't not work correctly

问题 I always get a lot of help from stack overflows. Thank you all the time. I am doing simple natural language processing using spacy . I'm working on filtering out words by measuring the similarity between words. I wrote and used the following simple code shown in the spacy documentation, but the result does not look like a documentation. import spacy nlp = spacy.load('en_core_web_lg') tokens = nlp('dog cat banana') for token1 in tokens: for token2 in tokens: sim = token1.similarity(token2)

Matcher is returning some duplicates entry

阅读更多关于 Matcher is returning some duplicates entry

问题 I want output as ["good customer service","great ambience"] but I am getting ["good customer","good customer service","great ambience"] because pattern is matching with good customer also but this phrase doesn't make any sense. How can I remove these kind of duplicates import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") doc = nlp("good customer service and great ambience") matcher = Matcher(nlp.vocab) # Create a pattern matching two tokens: adjective followed by

Can't install spaCy on WinPython: “ ModuleNotFoundError: No module named 'semver'”

阅读更多关于 Can't install spaCy on WinPython: “ ModuleNotFoundError: No module named 'semver'”

问题 I'm trying to use a portable Python interpreter therefore I installed WinPython and plan to deploy my application to other machines someday. For my application I need to use a NLP module "spaCy". I tried to install spaCy on WinPython ( pip install -U spacy ), but it can not be installed. When it installs the module dependencies, a module "semver" seems can not be installed: Collecting semver (from sputnik<0.10.0,>=0.9.2->spacy) Using cached semver-2.7.6.tar.gz Complete output from command

Need approach on building Custom NER for extracting below keywords from any format of payslips

阅读更多关于 Need approach on building Custom NER for extracting below keywords from any format of payslips

问题 I am trying to build a generic extraction of below parameters from any format of payslip: Name His PostCode Pay Date Net Pay. Challenge I am facing is due to variety of format that may come, I want to apply NER (Spacy) to learn these under the entities Name - PERSON His PostCode Pay Date - DATE Net Pay. - MONEY But I am unsuccess so far, I even tried to build a custom EntityMatcher for Postcode & Date but to no success. I seek any guideline and approach to make me take the right path in

Load up previously saved NER models in SpaCy v1.1.2

阅读更多关于 Load up previously saved NER models in SpaCy v1.1.2

问题 So whenever I try to load up a previously saved model for SpaCy NER, I get a core dump. if os.path.isfile( model_path ): ner.model.load( model_path ) for itn in range( 5 ): random.shuffle( TRAIN_DATA ) for raw_text, entity_offsets in TRAIN_DATA: doc = nlp.make_doc( raw_text ) gold = GoldParse( doc, entities=entity_offsets ) ner.update( doc, gold ) # <- Core dump occurs here Dump report: 7fb1b7459000-7fb1b7499000 rw-p 00000000 00:00 0 [1] 23967 abort (core dumped) Am I doing/loading it wrong?

Spacy: get position of word with entity tag

阅读更多关于 Spacy: get position of word with entity tag

问题 I'm trying to get the position of a word and it's entity tag by iterating over a sentence, as per the spacy docs import spacy nlp = spacy.load('en') doc = nlp(u'London is a big city in the United Kingdom.') for ent in doc.ents: print(ent.label_, ent.text) # GPE London # GPE United Kingdom I've tried to get the position of the word with the tag ent.i and ent.idx however neither of these work and give the following error AttributeError: 'spacy.tokens.span.Span' object has no attribute 'i' 回答1:

spacy module install in conda

阅读更多关于 spacy module install in conda

问题 After installing spacy with conda in windows 7 machine I ran the following code: import spacy nlp = spacy.load('en') The error I received is the following: Warning: no model found for 'en' Only loading the 'en' tokenizer. Following some searches I ran the following code on commandline (cmd): python -m spacy download en The error I receive is: Traceback (most recent call last): File "C:\Users\vranjan2\AppData\Local\Continuum\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main "__main__",

spacy module install in conda

阅读更多关于 spacy module install in conda

Multi-Threaded NLP with Spacy pipe

阅读更多关于 Multi-Threaded NLP with Spacy pipe

问题 I'm trying to apply Spacy NLP (Natural Language Processing) pipline to a big text file like Wikipedia Dump. Here is my code based on Spacy's documentation example: from spacy.en import English input = open("big_file.txt") big_text= input.read() input.close() nlp= English() out = nlp.pipe([unicode(big_text, errors='ignore')], n_threads=-1) doc = out.next() Spacy applies all nlp operations like POS tagging, Lemmatizing and etc all at once. It is like a pipeline for NLP that takes care of

how to write spacy matcher of POS regex

阅读更多关于 how to write spacy matcher of POS regex

问题 Spacy has two features I'd like to combine - part-of-speech (POS) and rule-based matching. How can I combine them in a neat way? For example - let's say input is a single sentence and I'd like to verify it meets some POS ordering condition - for example the verb is after the noun (something like noun**verb regex). result should be true or false. Is that doable? or the matcher is specific like in the example Rule-based matching can have POS rules? If not - here is my current plan - gather