language-model | 易学教程

Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

阅读更多关于 Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

问题 I have a sequence prediction problem in which, given the last n items in a sequence I need to predict next item. I have more than 2 million sequences each with different timesteps ( length of sequence ), like some are just 5 and some are 50/60/100/200 upto 500. seq_inputs = [ ["AA1", "BB3", "CC4",…,"DD5"], #length/timeteps 5 ["FF1", "DD3", "FF6","KK8","AA5", "CC8",…, "AA2"] #length/timeteps 50 ["AA2", "CC8", "CC11","DD3", "FF6","AA1", "BB3",……,”DD11”]#length/timesteps 200 .. .. ] # there are

Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

阅读更多关于 Keras Lstm predicting next item, taking whole sequences or sliding window. Will sliding window need stateful LSTM?

How to work with n-grams for classification tasks?

阅读更多关于 How to work with n-grams for classification tasks?

来源： https://stackoverflow.com/questions/64543626/how-to-work-with-n-grams-for-classification-tasks

Pretraining a language model on a small custom corpus

阅读更多关于 Pretraining a language model on a small custom corpus

问题 I was curious if it is possible to use transfer learning in text generation, and re-train/pre-train it on a specific kind of text. For example, having a pre-trained BERT model and a small corpus of medical (or any "type") text, make a language model that is able to generate medical text. The assumption is that you do not have a huge amount of "medical texts" and that is why you have to use transfer learning. Putting it as a pipeline, I would describe this as: Using a pre-trained BERT

Pretraining a language model on a small custom corpus

阅读更多关于 Pretraining a language model on a small custom corpus

word2vec - what is best? add, concatenate or average word vectors?

阅读更多关于 word2vec - what is best? add, concatenate or average word vectors?

问题 I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix). As outlined in this post there are at least three common ways to combine these two embedding vectors: summing the context and word vector for each

Calculate perplexity of word2vec model

阅读更多关于 Calculate perplexity of word2vec model

问题 I trained Gensim W2V model on 500K sentences (around 60K) words and I want to calculate the perplexity. What will be the best way to do so? for 60K words, how can I check what will be a proper amount of data? Thanks 回答1: If you want to calculate the perplexity, you have first to retrieve the loss. On the gensim.models.word2vec.Word2Vec constructor, pass the compute_loss=True parameter - this way, gensim will store the loss for you while training. Once trained, you can call the get_latest

TensorFlow Embedding Lookup

阅读更多关于 TensorFlow Embedding Lookup

问题 I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with

Positional Encodings leads to worse convergence, language modeling

阅读更多关于 Positional Encodings leads to worse convergence, language modeling

问题 This is a tough question, but I might as well try. I'm implementing the architecture from this paper https://arxiv.org/pdf/1503.08895.pdf for language modeling. See page 2 for a diagram, and the top of page 5 for the section on positional or "temporal" encoding. More on positional encoding can be found here, https://arxiv.org/pdf/1706.03762.pdf at the bottom of page 5/top of page 6. (I was directed to that second paper by the authors of the first.) So here's my keras implementation in a

CMU Sphinx4 - Custom Language Model

阅读更多关于 CMU Sphinx4 - Custom Language Model

问题 I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link: http://cmusphinx.sourceforge.net/wiki/tutoriallm but I am not sure if that would work for my usecase. So my question is three-folds : Can Sphinx4 actually recognize an alphanumeric sequence with high accuracy like an emp number in my case? If yes, can anyone