spacy | 易学教程

Sentence Segmentation using Spacy

阅读更多关于 Sentence Segmentation using Spacy

问题 I am new to Spacy and NLP. Facing the below issue while doing sentence segmentation using Spacy. The text I am trying to tokenise into sentences contains numbered lists(with space between numbering and actual text) . Like below. import spacy nlp = spacy.load('en_core_web_sm') text = "This is first sentence.\nNext is numbered list.\n1. Hello World!\n2. Hello World2!\n3. Hello World!" text_sentences = nlp(text) for sentence in text_sentences.sents: print(sentence.text) Output (1.,2.,3. are

How to install models/download packages on Google Colab?

阅读更多关于 How to install models/download packages on Google Colab?

问题 I am using text analytics library "Spacy". I've installed spacy on Google Colab notebook without any issue. But for using it I need to download "en" model. Generally, that command should look like this: python -m spacy download en I tried few ways but I am not able to get it to install on the notebook. Looking for help. Cheers 回答1: If you have a Python interpreter but not a teriminal, you could try: import spacy.cli spacy.cli.download("en_core_web_sm") More manual alternatives can be found

Spacy - Tokenize quoted string

阅读更多关于 Spacy - Tokenize quoted string

问题 I am using spacy 2.0 and using a quoted string as input. Example string "The quoted text 'AA XX' should be tokenized" and expecting to extract [The, quoted, text, 'AA XX', should, be, tokenized] I however get some strange results while experimenting. Noun chunks and ents looses one of the quote. import spacy nlp = spacy.load('en') s = "The quoted text 'AA XX' should be tokenized" doc = nlp(s) print([t for t in doc]) print([t for t in doc.noun_chunks]) print([t for t in doc.ents]) Result [The,

How does spacy use word embeddings for Named Entity Recognition (NER)?

阅读更多关于 How does spacy use word embeddings for Named Entity Recognition (NER)?

问题 I'm trying to train an NER model using spaCy to identify locations, (person) names, and organisations. I'm trying to understand how spaCy recognises entities in text and I've not been able to find an answer. From this issue on Github and this example, it appears that spaCy uses a number of features present in the text such as POS tags, prefixes, suffixes, and other character and word-based features in the text to train an Averaged Perceptron. However, nowhere in the code does it appear that

How does spacy use word embeddings for Named Entity Recognition (NER)?

阅读更多关于 How does spacy use word embeddings for Named Entity Recognition (NER)?

How to improve accuracy of Rasa NLU while using Spacy as pipeline?

阅读更多关于 How to improve accuracy of Rasa NLU while using Spacy as pipeline?

问题 In Spacy documentation it is mentioned that it uses vector similarity in featurization and hence in classification. For example if we test a sentence which is not in the training data but has same meaning then it should be classified in same intent in which training sentences have classified. But it's not happening. Let's say training data is like this- ## intent: delete_event - delete event - delete all events - delete all events of friday - delete ... Now if I test remove event then it is

How to resolve Misaligned Entity Annotation error in RASA NLU

阅读更多关于 How to resolve Misaligned Entity Annotation error in RASA NLU

问题 I am trying to import a LUIS schema model into RASA and trying to train it using the spacy + scikit pipeline. I am using RASA NLU v0.10.4 But when I try to load the LUIS model schema the ner_crf component is throwing a Misaligned Entity Annotation warning. Although I have tagged the entities correctly in the LUIS model schema. Here is my config file: { "project": "SynonymsExample", "path": "C:\\Users\\xyz\\Desktop\\RASA\\models", "response_log": "C:\\Users\\xyz\\Desktop\\RASA\\logs",

subject object identification in python

阅读更多关于 subject object identification in python

问题 I want to identify subject and objects of a set of sentences . My actual work is to identify cause and effect from a set of review data. I am using Spacy Package to chunk and parse data. But not actually reaching my goal. Is there any way to do so? E.g.: I thought it was the complete set out: subject object I complete set 回答1: In the simplest way. The dependencies are accessed by token.dep_ Having imported spacy: import spacy nlp = spacy.load('en') parsed_text = nlp(u"I thought it was the

Older versions of spaCy throws “KeyError: 'package'” error when trying to install a model

阅读更多关于 Older versions of spaCy throws “KeyError: 'package'” error when trying to install a model

问题 I use spaCy 1.6.0 on Ubuntu 14.04.4 LTS x64 with python3.5. To install the English model of spaCy, I tried to run: This gives me the error message: ubun@ner-3:~/NeuroNER-master/src$ python3.5 -m spacy.en.download Downloading parsing model Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.5/dist-packages/spacy

spaCy - Tokenization of Hyphenated words

阅读更多关于 spaCy - Tokenization of Hyphenated words

问题 Good day SO, I am trying to post-process hyphenated words that are tokenized into separate tokens when they were supposedly a single token. For example: Example: Sentence: "up-scaled" Tokens: ['up', '-', 'scaled'] Expected: ['up-scaled'] For now, my solution is to use the matcher: matcher = Matcher(nlp.vocab) pattern = [{'IS_ALPHA': True, 'IS_SPACE': False}, {'ORTH': '-'}, {'IS_ALPHA': True, 'IS_SPACE': False}] matcher.add('HYPHENATED', None, pattern) def quote_merger(doc): # this will be