nlp

How to inverse lemmatization process given a lemma and a token?

人走茶凉 提交于 2020-08-06 12:45:45
问题 Generally, in natural language processing, we want to get the lemma of a token. For example, we can map 'eaten' to 'eat' using wordnet lemmatization. Is there any tools in python that can inverse lemma to a certain form? For example, we map 'go' to 'gone' given target form 'eaten'. PS: Someone mentions we have to store such mappings. How to un-stem a word in Python? 回答1: Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization").

How to inverse lemmatization process given a lemma and a token?

独自空忆成欢 提交于 2020-08-06 12:45:17
问题 Generally, in natural language processing, we want to get the lemma of a token. For example, we can map 'eaten' to 'eat' using wordnet lemmatization. Is there any tools in python that can inverse lemma to a certain form? For example, we map 'go' to 'gone' given target form 'eaten'. PS: Someone mentions we have to store such mappings. How to un-stem a word in Python? 回答1: Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization").

In spell checker how to get the word that are 3 edits away(norvig)

邮差的信 提交于 2020-08-06 06:10:29
问题 I have been trying to use spell corrector for my database table to correct the address from one table, for which I have used the reference of http://norvig.com/spell-correct.html Using the Address_mast table as a collection of strings I'm trying to correct and update the corrected string in " customer_master " Address_mast ID Address 1 sonal plaza,harley road,sw-309012 2 rose apartment,kell road, juniper, la-293889 3 plot 16, queen's tower, subbden - 399081 4 cognizant plaza, abs road, ziggar

How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?

无人久伴 提交于 2020-08-05 05:23:31
问题 This question is for those who are familiar with GPT or GPT2 OpenAI models. In particular, with the encoding task (Byte-Pair Encoding). This is my problem: I would like to know how I could create my own vocab.bpe file. I have a spanish corpus text that I would like to use to fit my own bpe encoder. I have succeedeed in creating the encoder.json with the python-bpe library, but I have no idea on how to obtain the vocab.bpe file. I have reviewed the code in gpt-2/src/encoder.py but, I have not

Combining bag of words and other features in one model using sklearn and pandas

爱⌒轻易说出口 提交于 2020-07-31 06:45:37
问题 I am trying to model the score that a post receives, based on both the text of the post, and other features (time of day, length of post, etc.) I am wondering how to best combine these different types of features into one model. Right now, I have something like the following (stolen from here and here). import pandas as pd ... def features(p): terms = vectorizer(p[0]) d = {'feature_1': p[1], 'feature_2': p[2]} for t in terms: d[t] = d.get(t, 0) + 1 return d posts = pd.read_csv('path/to/csv')

How to add attention layer to a Bi-LSTM

我的未来我决定 提交于 2020-07-24 18:16:23
问题 I am developing a Bi-LSTM model and want to add a attention layer to it. But I am not getting how to add it. My current code for the model is model = Sequential() model.add(Embedding(max_words, 1152, input_length=max_len, weights=[embeddings])) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Bidirectional(LSTM(32))) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid'))

How to add attention layer to a Bi-LSTM

六月ゝ 毕业季﹏ 提交于 2020-07-24 18:13:30
问题 I am developing a Bi-LSTM model and want to add a attention layer to it. But I am not getting how to add it. My current code for the model is model = Sequential() model.add(Embedding(max_words, 1152, input_length=max_len, weights=[embeddings])) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Bidirectional(LSTM(32))) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid'))

Why is the number of stem from NLTK Stemmer outputs different from expected output?

穿精又带淫゛_ 提交于 2020-07-23 06:42:03
问题 I have to perform Stemming on a text. The questions are as follows : Tokenize all the words given in tc . The word should contain alphabets or numbers or underscore. Store the tokenized list of words in tw Convert all the words into lowercase. Store the result into the variable tw Remove all the stop words from the unique set of tw . Store the result into the variable fw Stem each word present in fw with PorterStemmer, and store the result in the list psw Below is my code : import re import

Why is the number of stem from NLTK Stemmer outputs different from expected output?

限于喜欢 提交于 2020-07-23 06:41:07
问题 I have to perform Stemming on a text. The questions are as follows : Tokenize all the words given in tc . The word should contain alphabets or numbers or underscore. Store the tokenized list of words in tw Convert all the words into lowercase. Store the result into the variable tw Remove all the stop words from the unique set of tw . Store the result into the variable fw Stem each word present in fw with PorterStemmer, and store the result in the list psw Below is my code : import re import

Why is the number of stem from NLTK Stemmer outputs different from expected output?

杀马特。学长 韩版系。学妹 提交于 2020-07-23 06:39:29
问题 I have to perform Stemming on a text. The questions are as follows : Tokenize all the words given in tc . The word should contain alphabets or numbers or underscore. Store the tokenized list of words in tw Convert all the words into lowercase. Store the result into the variable tw Remove all the stop words from the unique set of tw . Store the result into the variable fw Stem each word present in fw with PorterStemmer, and store the result in the list psw Below is my code : import re import