nlp | 易学教程

How to transform multiple features in a PipeLine using FeatureUnion?

阅读更多关于 How to transform multiple features in a PipeLine using FeatureUnion?

问题 I have a pandas data frame that contains information about messages sent by user. For my model, I'm interested in predicting missing recipients of a message i,e given recipients A,B,C of a message I want to predict who else should have been part of the recipients. I'm doing multi-label classification using OneVsRestClassifier and LinearSVC. For features, I want to use the recipients of the message. subject and body. Since recipients is a list of users, I want to transform that column using

How to transform multiple features in a PipeLine using FeatureUnion?

阅读更多关于 How to transform multiple features in a PipeLine using FeatureUnion?

Inverse Document Frequency Formula

阅读更多关于 Inverse Document Frequency Formula

问题 I'm having trouble with manually calculating the values for tf-idf. Python scikit keeps spitting out different values than I'd expect. I keep reading that idf(term) = log(# of docs/ # of docs with term) If so, won't you get a divide by zero error if there are no docs with the term? To solve that problem, I read that you do log (# of docs / # of docs with term + 1 ) But then if the term is in every document, you get log (n/n+1) which is negative, which doesn't really make sense to me. What am

Computing TF-IDF on the whole dataset or only on training data?

阅读更多关于 Computing TF-IDF on the whole dataset or only on training data?

问题 In the chapter seven of this book "TensorFlow Machine Learning Cookbook" the author in pre-processing data uses fit_transform function of scikit-learn to get the tfidf features of text for training. The author gives all text data to the function before separating it into train and test. Is it a true action or we must separate data first and then perform fit_transform on train and transform on test? 回答1: I have not read the book and I am not sure whether this is actually a mistake in the book

Difference between max length of word ngrams and size of context window

阅读更多关于 Difference between max length of word ngrams and size of context window

问题 In the description of the fasttext library for python https://github.com/facebookresearch/fastText/tree/master/python for training a supervised model there are different arguments, where among others are stated as: ws : size of the context window wordNgrams : max length of word ngram. If I understand it right, both of them are responsible for taking into account the surrounding words of the word, but what is the clear difference between them? 回答1: First, we use the train_unsupervised API to

eli5: show_weights() with two labels

阅读更多关于 eli5: show_weights() with two labels

问题 I'm trying eli5 in order to understand the contribution of terms to the prediction of certain classes. You can run this script: import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.datasets import fetch_20newsgroups #categories = ['alt.atheism', 'soc.religion.christian'] categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics'] np.random.seed(1) train

BERT sentence embedding by summing last 4 layers

阅读更多关于 BERT sentence embedding by summing last 4 layers

问题 I used Chris Mccormick tutorial on BERT using pytorch-pretained-bert to get a sentence embedding as follows: tokenized_text = tokenizer.tokenize(marked_text) indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) segments_ids = [1] * len(tokenized_text) tokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids]) model = BertModel.from_pretrained('bert-base-uncased') model.eval() with torch.no_grad(): encoded_layers, _ = model(tokens_tensor,

Function call stack: keras_scratch_graph Error

阅读更多关于 Function call stack: keras_scratch_graph Error

问题 I am reimplementing a text2speech project. I am facing a Function call stack : keras_scratch_graph error in decoder part. The network architecture is from Deep Voice 3 paper. I am using keras from TF 2.0 on Google Colab. Below is the code for Decoder Keras Model. y1 = tf.ones(shape = (16, 203, 320)) def Decoder(name = "decoder"): # Decoder Prenet din = tf.concat((tf.zeros_like(y1[:, :1, -hp.mel:]), y1[:, :-1, -hp.mel:]), 1) keys = K.Input(shape = (180, 256), batch_size = 16, name = "keys")

Function call stack: keras_scratch_graph Error

阅读更多关于 Function call stack: keras_scratch_graph Error

How to get probability of prediction per entity from Spacy NER model?

阅读更多关于 How to get probability of prediction per entity from Spacy NER model?

问题 I used this official example code to train a NER model from scratch using my own training samples. When I predict using this model on new text, I want to get the probability of prediction of each entity. # test the saved model print("Loading from", output_dir) nlp2 = spacy.load(output_dir) for text, _ in TRAIN_DATA: doc = nlp2(text) print("Entities", [(ent.text, ent.label_) for ent in doc.ents]) print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc]) I am unable to find a method in