nlp

What to do when Seq2Seq network repeats words over and over in output?

不羁的心 提交于 2020-05-25 08:37:05
问题 So, I've been working on a project for a while, we have very little data, I know it would become much better if we were able to put together a much much larger dataset. That aside, my issue at the moment is when I have a sentence input, my outputs look like this right now: contactid contactid contactid contactid A single word is focused on and repeated over and over again. What can I do to overcome this hurdle? Things I've tried: Double checked I was appending start/stop tokens and make sure

Rewriting sentences while retaining semantic meaning

£可爱£侵袭症+ 提交于 2020-05-24 08:49:27
问题 Is it possible to use WordNet to rewrite a sentence so that the semantic meaning of the sentence still ways the same (or mostly the same)? Let's say I have this sentence: Obama met with Putin last week. Is it possible to use WordNet to rephrase the sentence into alternatives like: Obama and Putin met the previous week. Obama and Putin met each other a week ago. If changing the sentence structure is not possible, can WordNet be used to replace only the relevant synonyms? For example: Obama met

How to match dependency patterns with spaCy?

心不动则不痛 提交于 2020-05-15 09:57:11
问题 Is there a way to use spaCy's rule-based pattern matcher (or a similar library) on dependency sequences such as the list of tokens returned by token.ancestors ? For example, I have pluralized a noun and now I need to check for dependent verbs to fix any errors in verb agreement. So one pattern (of many) would be to match an 'auxpass' verb belonging to a parent verb which is a relative clause of the noun. 回答1: I kind of hesitate to recommend something that doesn't have any documentation yet,

Trouble Installing spaCy english model in python 2.7? And upgrading python to 3.5?

孤人 提交于 2020-05-15 08:22:21
问题 I am trying to install the spaCy english model on my mac after installing the program. Right now my machine has python 2.7. I have installed spaCy in the venv then followed that with "python -m spacy.en.download" to install the model as instructed on the website. When I try to do that I get the following in response: $ python -m spacy.en.download Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 162, in _run_module

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

旧时模样 提交于 2020-05-15 05:13:10
问题 I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So I'm not able to map the output of the pipeline back to my original text. Moreover, the outputs are masked in BERT tokenization format (the default model is BERT-large). For example: from transformers import pipeline nlp_bert_lg = pipeline('ner') print(nlp_bert_lg('Hugging Face is a French

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

若如初见. 提交于 2020-05-15 05:13:07
问题 I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So I'm not able to map the output of the pipeline back to my original text. Moreover, the outputs are masked in BERT tokenization format (the default model is BERT-large). For example: from transformers import pipeline nlp_bert_lg = pipeline('ner') print(nlp_bert_lg('Hugging Face is a French

wordnet lemmatizer in NLTK is not working for adverbs [duplicate]

牧云@^-^@ 提交于 2020-05-13 14:42:06
问题 This question already has answers here : Getting adjective from an adverb in nltk or other NLP library (2 answers) Closed 5 years ago . from nltk.stem import WordNetLemmatizer x = WordNetLemmatizer() x.lemmatize("angrily", pos='r') Out[41]: 'angrily' Here is reference documnetation for pos tags in nltk wordnet, http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html I may be missing some basic things. Please let me know 回答1: Try: >>> from nltk.corpus import wordnet as wn >>> wn.synset(

How does TfidfVectorizer compute scores on test data

自闭症网瘾萝莉.ら 提交于 2020-05-13 05:36:06
问题 In scikit-learn TfidfVectorizer allows us to fit over training data, and later use the same vectorizer to transform over our test data. The output of the transformation over the train data is a matrix that represents a tf-idf score for each word for a given document. However, how does the fitted vectorizer compute the score for new inputs? I have guessed that either: The score of a word in a new document computed by some aggregation of the scores of the same word over documents in the

Negation handling in NLP

我只是一个虾纸丫 提交于 2020-05-10 03:26:50
问题 I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie wasn't that good. Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...). In the previous example, the

Negation handling in NLP

杀马特。学长 韩版系。学妹 提交于 2020-05-10 03:26:20
问题 I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie wasn't that good. Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...). In the previous example, the