spacy

Is there a simple way to tell SpaCy to ignore stop words when using .similarity method?

烈酒焚心 提交于 2019-12-11 01:15:35
问题 So right now I have a really simple program that will take a sentence and find the sentence in a given book that is most semantically similar and prints out that sentence along with the next few sentences. import spacy nlp = spacy.load('en_core_web_lg') #load alice in wonderland from gutenberg.acquire import load_etext from gutenberg.cleanup import strip_headers text = strip_headers(load_etext(11)).strip() alice = nlp(text) sentences = list(alice.sents) mysent = nlp(unicode("example sentence,

Use spaCy entities in Rasa-NLU training data

佐手、 提交于 2019-12-11 01:09:20
问题 I'm trying to create a simple program with Rasa which extracts a (French) street address from a text input. Following the advice in Rasa-NLU doc (http://rasa-nlu.readthedocs.io/en/latest/entities.html), I want to use spaCy to do the address detection. I saw (https://spacy.io/usage/training) that the corresponding spaCy prebuilt entity would be LOC . However, I don't understand how to create a training dataset with this entity. Here is an excerpt from my current JSON training dataset : { "text

How to create training data for RASA NLU through program nodejs

牧云@^-^@ 提交于 2019-12-10 12:19:01
问题 How to create training data through program for RASA NLU? Actually I am developing an application using MEAN stack, this application prepares the data that needs to be trained with RASA NLU. But I don't know how to pass this info from my nodejs server to RASA NLU. Is there any supported api's to achieve this? 回答1: Rasa has a highly functional API as documented here. To answer the specific question you can pass training data to the Rasa NLU API via the below commands: If your training data is

Can a token be removed from a spaCy document during pipeline processing?

主宰稳场 提交于 2019-12-08 14:31:40
问题 I am using spaCy (a great Python NLP library) to process a number of very large documents, however, my corpus has a number of common words that I would like to eliminate in the document processing pipeline. Is there a way to remove a token from the document within a pipeline component? 回答1: spaCy's tokenization is non-destructive , so it always represents the original input text and never adds or deletes anything. This is kind of a core principle of the Doc object: you should always be able

wish to extract compound noun-adjective pairs from a sentence. So, basically I want something like :

点点圈 提交于 2019-12-08 08:03:31
问题 For the adjective: "The company's customer service was terrible." {customer service, terrible} For the verb: "They kept increasing my phone bill" {phone bill, increasing} This is a branch questions from this posting However I'm trying to find adj and verbs corresponding to multi-token phrases/compound nouns such as "customer service" using spacy. I'm not sure how to do this with spacy, nltk, or any other prepackaged natural language processing software, and I'd appreciate any help! 回答1: For

Conda-forge spaCy install fails - Error: WinError 87 - the parameter is incorrect

喜你入骨 提交于 2019-12-08 06:45:24
问题 I'm trying to install spaCy in a conda environment (Anaconda 2019.03 - latest release) on Windows 10 using the command "conda install -c conda-forge spacy" recommended on the spaCy website. However I'm receiving OS Error 22 / WinError 87(the parameter is incorrect) when conda tries to install dependencies. I've also tried installing spaCy using pip, but I run into similar difficulties. I get the same WinError 87 error - "the parameter is incorrect". The full output from the conda attempt is:

Implementing custom POS Tagger in Spacy over existing english model : NLP - Python

帅比萌擦擦* 提交于 2019-12-08 03:00:04
问题 I am trying to retrain the existing POS Tagger in spacy to display the proper tags for certain misclassified words using the code below. But it gives me this error : Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0)) from spacy.vocab import Vocab from spacy.tokens import Doc from spacy.gold import GoldParse nlp = spacy.load('en_core_web_sm') optimizer = nlp.begin_training() vocab = Vocab(tag_map={}) doc = Doc(vocab, words=[word for word in [

ValueError with spacy.load('en_core_web_sm')

二次信任 提交于 2019-12-08 01:52:43
问题 I'm getting ValueError: could not broadcast input array from shape (96) into shape (128) for spacy.load('en_core_web_sm') I manually downloaded and installed the model as i'm working on a work computer with download restrictions. I have followed the instructions to download and copy from this link: https://github.com/explosion/spaCy/issues/3113 Copy the folder Python35\lib\site-packages\en_core_web_sm create a folder named en in Python35\Lib\site-packages\spacy\data , paste the copied

SpaCy model training data: WikiNER

戏子无情 提交于 2019-12-07 22:56:35
问题 For the model xx_ent_wiki_sm of 2.0 version of SpaCy there is mention of "WikiNER" dataset, which leads to article 'Learning multilingual named entity recognition from Wikipedia'. Is there any resource for downloading of such dataset for retraining that model? Or script for Wikipedia dump processing? 回答1: The data server from Joel (and my) former researcher group seems to be offline: http://downloads.schwa.org/wikiner I found a mirror of the wp3 files here, which are the ones I'm using in

How to add custom slangs into spaCy's norm_exceptions.py module?

陌路散爱 提交于 2019-12-07 21:24:33
问题 SpaCy's documentation has some information on adding new slangs here. However, I'd like to know: (1) When should I call the following function? lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS) The typical usage of spaCy, according to the introduction guide here, is something as follows: import spacy nlp = spacy.load('en') # Should I call the function add_lookups(...) here? doc = nlp(u'Apple is looking at buying U.K. startup for $1