spacy | 易学教程

Installing Spacy for NLP with Python 3 & Windows gives error when installing from source

阅读更多关于 Installing Spacy for NLP with Python 3 & Windows gives error when installing from source

问题 I'm following the directions Spacy gives to install for Windows, Python 3, and from source (pip and conda have both given me errors that I've still been unable to resolve, directly from source seems to get the closest to actually installing). However, when I get to step 3 and enter export PYTHONPATH = pwd in the command line (with the quotes around pwd like it wants, it just messes up the formatting here), I get this error message: export is not recognized as an internal or external command,

Regular expression SpaCy

阅读更多关于 Regular expression SpaCy

问题 I am creating a spaCy regular expression matches for matching number and extracting it pandas data frame. Question: Panda picks up from number but overwrites value instead of appending. How to solve it? (original code credit: yarongon) from __future__ import unicode_literals import spacy import re import pandas as pd from datetime import date nlp = spacy.load('en_core_web_sm', disable=['parser', 'tagger', 'ner']) doc = nlp("This is a sample number: 11. This is second sample number: 1145.")

Spacy on AppEngine standard

阅读更多关于 Spacy on AppEngine standard

问题 I'm trying to use Spacy on the new AppEngine Standard Python 3.7 runtime. When I try to deploy I get: ERROR: (gcloud.app.deploy) Cannot upload file [/my/project/path/venv/lib/python3.7/site-packages/spacy/lang/tr/lemmatizer.py], which has size [41523943] (greater than maximum allowed size of [33554432]). Please delete the file or add to the skip_files entry in your application .yaml file and try again. A few oddities: The docs seem to indicate that I don't need to upload the virtual

Training and evaluating spaCy model by sentences or paragraphs

阅读更多关于 Training and evaluating spaCy model by sentences or paragraphs

问题 Observation: Paragraph: I love apple. I eat one banana a day Sentence: I love apple. , I eat one banana a day There are two sentences in this paragraph, I love apple and I eat one banana a day . If I put the whole paragraph into spaCy, it'll recognize only one entity, for example, apple , but if I put the sentences in paragraph one by one, spaCy can recognize two entities, apple and banana .( This is just an example to show my point, the actual recognition result could be different )

ERROR: Complete output from command : Error installing spaCy using pip

阅读更多关于 ERROR: Complete output from command : Error installing spaCy using pip

问题 i have searched everywhere and tried many possible ways to install spaCy on my system using pip. tried updating pip, setuptools, tried running cmd with administrator privileges tried installing the required modules seperately. tried creating a virtual environment for installation, but still run into problem. this is the full outputError : ERROR:Complete output from command 'c:\program files (x86)\python36-32\python.exe' 'c:\program files (x86)\python36-32\lib\site-packages\pip' install -

spacy rule-matcher extract value from matched sentence

阅读更多关于 spacy rule-matcher extract value from matched sentence

问题 I have a custom rule matching in spacy, and I am able to match some sentences in a document. I would like to extract some numbers now from the matched sentences. However, the matched sentences do not have always have the same shape and form. What is the best way to do this? # case 1: texts = ["the surface is 31 sq", "the surface is sq 31" ,"the surface is square meters 31" ,"the surface is 31 square meters" ,"the surface is about 31,2 square" ,"the surface is 31 kilograms"] pattern = [ {

Document similarity in Spacy vs Word2Vec

阅读更多关于 Document similarity in Spacy vs Word2Vec

问题 I have a niche corpus of ~12k docs, and I want to test near-duplicate documents with similar meanings across it - think article about the same event covered by different news organisations. I have tried gensim's Word2Vec, which gives me terrible similarity score(<0.3) even when the test document is within the corpus, and I have tried SpaCy, which gives me >5k documents with similarity > 0.9. I tested SpaCy's most similar documents, and it was mostly useless. This is the relevant code. tfidf =

How to fix a python spaCy error: “undefined symbol: PySlice_AdjustIndices”?

阅读更多关于 How to fix a python spaCy error: “undefined symbol: PySlice_AdjustIndices”?

问题 Using official example from here I have a mistake: Traceback (most recent call last): File "/home/vv/PythProj/esi-code/webapp/sp_token.py", line 2, in <module> from spacy.en import English File "/home/vv/anaconda3/lib/python3.6/site-packages/spacy/__init__.py", line 4, in <module> from . import util File "/home/vv/anaconda3/lib/python3.6/site-packages/spacy/util.py", line 5, in <module> import regex as re File "/home/vv/anaconda3/lib/python3.6/site-packages/regex.py", line 394, in <module>

spaCy sentence segmentation failing on quotes

阅读更多关于 spaCy sentence segmentation failing on quotes

问题 I am parsing some news data with spaCy and am noticing a consistent failure regarding sentence segmentation where there is a quote. Has anyone else solved this issue? Here is a reproducible example - note sentence 4 in the output below. spaCy fails to split at the start of the quote, and this is consistent through other news articles I'm working with. Thanks a lot. Example: Raw data: u'body': u'\n LONDON Nov 4 Britons hurt by lower incomes and rising food prices after the financial crisis

How is SpaCy's similarity computed?

阅读更多关于 How is SpaCy's similarity computed?

问题 Beginner NLP Question here: How does the .similiarity method work? Wow spaCy is great! Its tfidf model could be easier to preprocess, but w2v with only one line of code (token.vector)?! - Awesome! In his 10 line tutorial on spaCy andrazhribernik show's us the .similarity method that can be run on tokens, sents, word chunks, and docs. After nlp = spacy.load('en') and doc = nlp(raw_text) we can do .similarity queries between tokens and chunks. However, what is being calculated behind the scenes