spacy

Is it possible to install SpaCy to Raspberry Pi 4 Raspbian Buster

蹲街弑〆低调 提交于 2020-02-25 03:39:28
问题 I have been stuck at installing SpaCy the entire day. sudo pip install -U spacy Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple Collecting spacy Using cached https://files.pythonhosted... Installing build dependencies ... done Complete output from command python setup.py egg_info: Failed building wheel for blis ERROR: Failed to build one or more wheels Traceback (most recent call last): File "/tmp/pip-build-env-e4fo917j/lib/python3.7/site-packages/setuptools

read corpus of text files in spacy

荒凉一梦 提交于 2020-02-24 08:45:09
问题 All the examples that I see for using spacy just read in a single text file (that is small in size). How does one load a corpus of text files into spacy? I can do this with textacy by pickling all the text in the corpus: docs = textacy.io.spacy.read_spacy_docs('E:/spacy/DICKENS/dick.pkl', lang='en') for doc in docs: print(doc) But I am not clear as to how to use this generator object (docs) for further analysis. Also, I would rather use spacy, not textacy. spacy also fails to read in a single

read corpus of text files in spacy

谁都会走 提交于 2020-02-24 08:44:09
问题 All the examples that I see for using spacy just read in a single text file (that is small in size). How does one load a corpus of text files into spacy? I can do this with textacy by pickling all the text in the corpus: docs = textacy.io.spacy.read_spacy_docs('E:/spacy/DICKENS/dick.pkl', lang='en') for doc in docs: print(doc) But I am not clear as to how to use this generator object (docs) for further analysis. Also, I would rather use spacy, not textacy. spacy also fails to read in a single

How to speed up spaCy lemmatization?

瘦欲@ 提交于 2020-02-24 05:10:46
问题 I'm using spaCy (version 2.0.11) for lemmatization in the first step of my NLP pipeline but unfortunately it's taking a verrry long time. It is clearly the slowest part of my processing pipeline and I want to know if there are improvements I could be making. I am using a pipeline as: nlp.pipe(docs_generator, batch_size=200, n_threads=6, disable=['ner']) on a 8 core machine, and I have verified that the machine is using all the cores. On a corpus of about 3 million short texts totaling almost

How to speed up spaCy lemmatization?

柔情痞子 提交于 2020-02-24 05:08:22
问题 I'm using spaCy (version 2.0.11) for lemmatization in the first step of my NLP pipeline but unfortunately it's taking a verrry long time. It is clearly the slowest part of my processing pipeline and I want to know if there are improvements I could be making. I am using a pipeline as: nlp.pipe(docs_generator, batch_size=200, n_threads=6, disable=['ner']) on a 8 core machine, and I have verified that the machine is using all the cores. On a corpus of about 3 million short texts totaling almost

How to speed up spaCy lemmatization?

我的未来我决定 提交于 2020-02-24 05:07:31
问题 I'm using spaCy (version 2.0.11) for lemmatization in the first step of my NLP pipeline but unfortunately it's taking a verrry long time. It is clearly the slowest part of my processing pipeline and I want to know if there are improvements I could be making. I am using a pipeline as: nlp.pipe(docs_generator, batch_size=200, n_threads=6, disable=['ner']) on a 8 core machine, and I have verified that the machine is using all the cores. On a corpus of about 3 million short texts totaling almost

Spacy to extract specific noun phrase

邮差的信 提交于 2020-02-20 07:44:32
问题 Can I use spacy in python to find NP with specific neighbors? I want Noun phrases from my text that has verb before and after it. 回答1: You can merge the noun phrases ( so that they do not get tokenized seperately). Analyse the dependency parse tree, and see the POS of neighbouring tokens. >>> import spacy >>> nlp = spacy.load('en') >>> sent = u'run python program run, to make this work' >>> parsed = nlp(sent) >>> list(parsed.noun_chunks) [python program] >>> for noun_phrase in list(parsed

Spacy, Strange similarity between two sentences

大兔子大兔子 提交于 2020-02-20 07:22:27
问题 I have downloaded en_core_web_lg model and trying to find similarity between two sentences: nlp = spacy.load('en_core_web_lg') search_doc = nlp("This was very strange argument between american and british person") main_doc = nlp("He was from Japan, but a true English gentleman in my eyes, and another one of the reasons as to why I liked going to school.") print(main_doc.similarity(search_doc)) Which returns very strange value: 0.9066019751888448 These two sentences should not be 90% similar

Converting Spacy Training Data format to Spacy CLI Format (for blank NER)

别来无恙 提交于 2020-02-10 19:59:41
问题 This is the classic training format. TRAIN_DATA = [ ("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}), ("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}), ] I used to train with code but as I understand, the training is better with CLI train method. However, my format is this. I have found code-snippets for this type of conversion but every one of them is performing spacy.load('en') rather than going with blank - which made me think, are they training

Package spacy model

≯℡__Kan透↙ 提交于 2020-01-28 11:25:25
问题 I want to include the spacy model de_core_news_sm in a python package. Here is my project: https://github.com/michaelhochleitner/package_de_core_news_sm . I package and install the project with the following commands. python setup.py sdist bdist_wheel pip install dist/example-pkg-mh-0.0.1.tar.gz I want to import the module example_pkg.import-model.py . $ python >>> import example_pkg.import_model Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/mh