spacy

Is possible to keep spacy in memory to reduce the load time? [closed]

若如初见. 提交于 2019-12-30 03:27:13
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago . I want to use spacy as for NLP for an online service. Each time a user makes a request I call the script "my_script.py" which starts with: from spacy.en import English nlp = English() The problem I'm having is that those two lines take over 10 seconds, is it possible to keep English() in the ram or

SpaCy: how to load Google news word2vec vectors?

落花浮王杯 提交于 2019-12-30 00:07:04
问题 I've tried several methods of loading the google news word2vec vectors (https://code.google.com/archive/p/word2vec/): en_nlp = spacy.load('en',vector=False) en_nlp.vocab.load_vectors_from_bin_loc('GoogleNews-vectors-negative300.bin') The above gives: MemoryError: Error assigning 18446744072820359357 bytes I've also tried with the .gz packed vectors; or by loading and saving them with gensim to a new format: from gensim.models.word2vec import Word2Vec model = Word2Vec.load_word2vec_format(

SpaCy: how to load Google news word2vec vectors?

混江龙づ霸主 提交于 2019-12-30 00:05:28
问题 I've tried several methods of loading the google news word2vec vectors (https://code.google.com/archive/p/word2vec/): en_nlp = spacy.load('en',vector=False) en_nlp.vocab.load_vectors_from_bin_loc('GoogleNews-vectors-negative300.bin') The above gives: MemoryError: Error assigning 18446744072820359357 bytes I've also tried with the .gz packed vectors; or by loading and saving them with gensim to a new format: from gensim.models.word2vec import Word2Vec model = Word2Vec.load_word2vec_format(

Python Cannot install module spaCy

谁都会走 提交于 2019-12-29 08:38:07
问题 I´m new to python and I ran into a problem I can´t solve. I would like to install and use the package spacy in python. Therefore I opened cmd and ran pip install spacy While installing the dependecies I get an error message: ---------------------------------------- Command ""c:\users\xxx\appdata\local\programs\python\python37\python.exe" -u -c "import setuptools, tokenize; file ='C:\Users\xxx\AppData\Local\Temp\pip-install-6vcdnb_4\numpy\setup.py';f=getattr(tokenize, 'open', open)( file )

自然语言分类任务 (1)

空扰寡人 提交于 2019-12-26 07:17:39
文章目录 自然语言分类任务 数据集 模型 准备数据 数据集预览 查看一个样本 切分train/val 创建vocabulary 查看训练集中最常见的单词 查看单词表 查看label 创建iterators Word Averaging模型 模型结构 模型配置 模型参数计算 初始化参数 定义优化器和损失函数 计算预测的准确率 模型训练train 模型验证evaluate 读取保存的模型 语句测试 RNN模型(没有跑) 模型结构 模型配置 词向量 使用glove训练好的词向量 添加PAD单词、UNK单词 模型参数计算 模型训练 读取保存的模型 语句测试 CNN模型 单卷积核 模型结构 模型配置 词向量 使用glove训练好的词向量 添加PAD单词、UNK单词 模型参数计算 优化器和损失函数 模型训练 读取保存的模型 语句测试 卷积核并联 模型结构 模型配置 词向量 使用glove训练好的词向量 添加PAD单词、UNK单词 模型参数计算 模型训练 Word Averaging (mask) 数据准备 创建vocabulary 创建iterators Word Averaging模型 模型结构 模型配置 计算模型参数 初始化模型 优化器和损失函数 计算预测的准确率 模型训练 模型验证 自然语言分类任务 使用Pytorch模型和TorchText做情感分类(检测一段文字的情感是正面情绪

Spacy to Conll format without using Spacy's sentence splitter

ⅰ亾dé卋堺 提交于 2019-12-24 20:53:49
问题 This post shows how to get dependencies of a block of text in Conll format with Spacy's taggers. This is the solution posted: import spacy nlp_en = spacy.load('en') doc = nlp_en(u'Bob bought the pizza to Alice') for sent in doc.sents: for i, word in enumerate(sent): if word.head == word: head_idx = 0 else: head_idx = word.head.i - sent[0].i + 1 print("%d\t%s\t%s\t%s\t%s\t%s\t%s"%( i+1, # There's a word.i attr that's position in *doc* word, word.lemma_, word.tag_, # Fine-grained tag word.ent

Using nlp.pipe() with pre-segmented and pre-tokenized text with spaCy

倾然丶 夕夏残阳落幕 提交于 2019-12-24 19:39:01
问题 I am trying to tag and parse text that has already been split up in sentences and has already been tokenized. As an example: sents = [['I', 'like', 'cookies', '.'], ['Do', 'you', '?']] The fastest approach to process batches of text is .pipe() . However, it is not clear to me how I can use that with pre-tokenized, and pre-segmented text. Performance is key here. I tried the following, but that threw an error docs = [nlp.tokenizer.tokens_from_list(sentence) for sentence in sents] nlp.tagger

How to use Hindi Model in RASA NLU?

倖福魔咒の 提交于 2019-12-24 10:29:31
问题 I have build my model for Hindi language using FastText with spacy backend. I followed this tutorial to to build my model using FastText. This URL I have also linked my model with spacy by following command python -m spacy link nl_model hi Model is linked successfully you can check in the image below Now I am not finding any help for using hindi language, Like what kind of config files do I need to use, where to import hindi model and how to proceed now? I also have question like how our data

calculate all the metrics of a custom Named Entity recognition (NER)Model using Spacy and ner.manual

北城以北 提交于 2019-12-24 07:39:01
问题 i have made a spacy (2.1.8) model which works on some labels like data, time, coordinate,stars... now I want to see all the metrics related to each entity using spacy. something like this precision recall f1-score support B-LOC 0.810 0.784 0.797 1084 I-LOC 0.690 0.637 0.662 325 B-MISC 0.731 0.569 0.640 339 I-MISC 0.699 0.589 0.639 557 B-ORG 0.807 0.832 0.820 1400 I-ORG 0.852 0.786 0.818 1104 B-PER 0.850 0.884 0.867 735 I-PER 0.893 0.943 0.917 634 I have noticed that I can use Scorer for that:

Spacy EN Model issue

瘦欲@ 提交于 2019-12-24 07:39:01
问题 Need to know the difference between spaCy's en and en_core_web_sm model. I am trying to do NER with Spacy.( For Organization name) Please find bellow the script I am using import spacy nlp = spacy.load("en_core_web_sm") text = "But Google is starting from behind. The company made a late push \ into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \ Alexa software, which runs on its Echo and Dot devices, have clear leads in consumer adoption." doc = nlp(text) for ent in doc.ents: