spacy

Python NLP Text Tokenization based on custom regex

五迷三道 提交于 2020-05-09 16:00:03
问题 I am processing large amount of text for custom (NER) Named Entity Recognition using Spacy. For text pre-processing I am using nltk for tokenization..etc. I am able to process one of my custom entities which is based on simple strings. But the other custom entity is a combination of number and certain text (20 BBLs for example). The word_tokenize method from nltk.tokenize tokenizes 20 and 'BBLs' separately each as a separate token. What I want is to treat them (the number and the 'BBLs'

SpaCy Parenthesis tokenization: pairs of (LRB, RRB) not tokenized correctly

早过忘川 提交于 2020-04-29 19:24:22
问题 When RRB is not separated by a space with its following word, it will be recognized as part of the word. In [34]: nlp("Indonesia (CNN)AirAsia ") Out[34]: Indonesia (CNN)AirAsia In [35]: d=nlp("Indonesia (CNN)AirAsia ") In [36]: [(t.text, t.lemma_, t.pos_, t.tag_) for t in d] Out[36]: [('Indonesia', 'Indonesia', 'PROPN', 'NNP'), ('(', '(', 'PUNCT', '-LRB-'), ('CNN)AirAsia', 'CNN)AirAsia', 'PROPN', 'NNP')] In [39]: d=nlp("(CNN)Police") In [40]: [(t.text, t.lemma_, t.pos_, t.tag_) for t in d]

几行代码搞定ML模型,低代码机器学习Python库正式开源

痴心易碎 提交于 2020-04-18 12:22:19
PyCaret 库支持在「低代码」环境中训练和部署有监督以及无监督的机器学习模型,提升机器学习实验的效率。 想提高机器学习实验的效率,把更多精力放在解决业务问题而不是写代码上?低代码平台或许是个不错的选择。 最近,机器之心发现了一个开源低代码机器学习 Python 库 PyCaret,它支持在「低代码」环境中训练和部署有监督以及无监督的机器学习模型。 GitHub 地址: https:// github.com/pycaret/pyca ret 用户文档: https://www. pycaret.org/guide Notebook 教程: https://www. pycaret.org/tutorial PyCaret 库支持数据科学家快速高效地执行端到端实验,与其他开源机器学习库相比,PyCaret 库只需几行代码即可执行复杂的机器学习任务。 该库适合有经验的数据科学家、倾向于低代码机器学习解决方案的公民数据科学家,以及编程背景较弱甚至没有的新手。 PyCaret 库支持多种 Notebook 环境,包括 Jupyter Notebook、Azure notebook 和 Google Colab。从本质上来看,PyCaret 是一个 Python 封装器,封装了多个机器学习库和框架,如 sci-kit-learn、XGBoost、Microsoft LightGBM

Spacy — ImportError: preshed.maps does not export expected C function map_clear

一世执手 提交于 2020-04-17 20:28:38
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

Spacy — ImportError: preshed.maps does not export expected C function map_clear

落爺英雄遲暮 提交于 2020-04-17 20:26:05
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

Spacy — ImportError: preshed.maps does not export expected C function map_clear

余生长醉 提交于 2020-04-17 20:22:30
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

Spacy — ImportError: preshed.maps does not export expected C function map_clear

て烟熏妆下的殇ゞ 提交于 2020-04-17 20:22:08
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

Spacy — ImportError: preshed.maps does not export expected C function map_clear

ぐ巨炮叔叔 提交于 2020-04-17 20:21:26
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

Spacy — ImportError: preshed.maps does not export expected C function map_clear

一曲冷凌霜 提交于 2020-04-17 20:21:14
问题 I am trying to import spacy in vain. >>> import spacy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\__init__.py", line 12, in <module> from . import pipeline File "C:\Users\Alienware\Anaconda3\envs\tf2\lib\site-packages\spacy\pipeline\__init__.py", line 4, in <module> from .pipes import Tagger, DependencyParser, EntityRecognizer, EntityLinker File "pipes.pyx", line 24, in init spacy.pipeline.pipes

SpaCy — intra-word hyphens. How to treat them one word?

天涯浪子 提交于 2020-04-11 06:31:23
问题 Following is the code provided as answer to the question; import spacy from spacy.tokenizer import Tokenizer from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex import re nlp = spacy.load('en') infixes = nlp.Defaults.prefixes + (r"[./]", r"[-]~", r"(.'.)") infix_re = spacy.util.compile_infix_regex(infixes) def custom_tokenizer(nlp): return Tokenizer(nlp.vocab, infix_finditer=infix_re.finditer) nlp.tokenizer = custom_tokenizer(nlp) s1 = "Marketing