nlp | 易学教程

How to inverse lemmatization process given a lemma and a token?

阅读更多关于 How to inverse lemmatization process given a lemma and a token?

问题 Generally, in natural language processing, we want to get the lemma of a token. For example, we can map 'eaten' to 'eat' using wordnet lemmatization. Is there any tools in python that can inverse lemma to a certain form? For example, we map 'go' to 'gone' given target form 'eaten'. PS: Someone mentions we have to store such mappings. How to un-stem a word in Python? 回答1: Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization").

How to inverse lemmatization process given a lemma and a token?

阅读更多关于 How to inverse lemmatization process given a lemma and a token?

In spell checker how to get the word that are 3 edits away(norvig)

阅读更多关于 In spell checker how to get the word that are 3 edits away(norvig)

问题 I have been trying to use spell corrector for my database table to correct the address from one table, for which I have used the reference of http://norvig.com/spell-correct.html Using the Address_mast table as a collection of strings I'm trying to correct and update the corrected string in " customer_master " Address_mast ID Address 1 sonal plaza,harley road,sw-309012 2 rose apartment,kell road, juniper, la-293889 3 plot 16, queen's tower, subbden - 399081 4 cognizant plaza, abs road, ziggar

How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?

阅读更多关于 How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?

问题 This question is for those who are familiar with GPT or GPT2 OpenAI models. In particular, with the encoding task (Byte-Pair Encoding). This is my problem: I would like to know how I could create my own vocab.bpe file. I have a spanish corpus text that I would like to use to fit my own bpe encoder. I have succeedeed in creating the encoder.json with the python-bpe library, but I have no idea on how to obtain the vocab.bpe file. I have reviewed the code in gpt-2/src/encoder.py but, I have not

Combining bag of words and other features in one model using sklearn and pandas

阅读更多关于 Combining bag of words and other features in one model using sklearn and pandas

问题 I am trying to model the score that a post receives, based on both the text of the post, and other features (time of day, length of post, etc.) I am wondering how to best combine these different types of features into one model. Right now, I have something like the following (stolen from here and here). import pandas as pd ... def features(p): terms = vectorizer(p[0]) d = {'feature_1': p[1], 'feature_2': p[2]} for t in terms: d[t] = d.get(t, 0) + 1 return d posts = pd.read_csv('path/to/csv')

How to add attention layer to a Bi-LSTM

阅读更多关于 How to add attention layer to a Bi-LSTM

问题 I am developing a Bi-LSTM model and want to add a attention layer to it. But I am not getting how to add it. My current code for the model is model = Sequential() model.add(Embedding(max_words, 1152, input_length=max_len, weights=[embeddings])) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Bidirectional(LSTM(32))) model.add(BatchNormalization()) model.add(Activation('tanh')) model.add(Dropout(0.5)) model.add(Dense(1, activation='sigmoid'))

How to add attention layer to a Bi-LSTM

阅读更多关于 How to add attention layer to a Bi-LSTM

Why is the number of stem from NLTK Stemmer outputs different from expected output?

阅读更多关于 Why is the number of stem from NLTK Stemmer outputs different from expected output?

问题 I have to perform Stemming on a text. The questions are as follows : Tokenize all the words given in tc . The word should contain alphabets or numbers or underscore. Store the tokenized list of words in tw Convert all the words into lowercase. Store the result into the variable tw Remove all the stop words from the unique set of tw . Store the result into the variable fw Stem each word present in fw with PorterStemmer, and store the result in the list psw Below is my code : import re import

Why is the number of stem from NLTK Stemmer outputs different from expected output?

阅读更多关于 Why is the number of stem from NLTK Stemmer outputs different from expected output?

Why is the number of stem from NLTK Stemmer outputs different from expected output?

阅读更多关于 Why is the number of stem from NLTK Stemmer outputs different from expected output?