language-model | 易学教程

Check perplexity of a Language Model

阅读更多关于 Check perplexity of a Language Model

问题 I created a language model with Keras LSTM and now I want to assess wether it's good so I want to calculate perplexity. What is the best way to calc perplexity of a model in Python? 回答1: I've come up with two versions and attached their corresponding source, please feel free to check the links out. def perplexity_raw(y_true, y_pred): """ The perplexity metric. Why isn't this part of Keras yet?! https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow https

TensorFlow: loss jumps up after restoring RNN net

阅读更多关于 TensorFlow: loss jumps up after restoring RNN net

问题 Environment info Operating System: Windows 7 64-bit Tensorflow installed from pre-built pip (no CUDA): 1.0.1 Python 3.5.2 64-bit Problem I have problems with restoring my net (RNN character base language model). Below is a simplified version with the same problem. When I run it the first time, I get, for example, this. ... step 160: loss = 1.956 (perplexity = 7.069016620211226) step 180: loss = 1.837 (perplexity = 6.274748642468816) step 200: loss = 1.825 (perplexity = 6.202084762557817) But

TensorFlow: loss jumps up after restoring RNN net

阅读更多关于 TensorFlow: loss jumps up after restoring RNN net

Using custom beam scorer in TensorFlow CTC (language model)

阅读更多关于 Using custom beam scorer in TensorFlow CTC (language model)

问题 Is it possible to customize beam scorer in TensorFlow CTC implementation from Python side? I see this possibility in comment for CTCBeamSearchDecoder C++ class constructor but wonder how to provide this functionality for Python users? Specific issue that we have is the plugging of language model into CTC based speech decoder. Language model can possibly be a pre-trained TensorFlow sub-graph, capable of outputting probabilities for beam score adjustment. But we need a way to inject this into

Using custom beam scorer in TensorFlow CTC (language model)

阅读更多关于 Using custom beam scorer in TensorFlow CTC (language model)

Is it possible to customize beam scorer in TensorFlow CTC implementation from Python side? I see this possibility in comment for CTCBeamSearchDecoder C++ class constructor but wonder how to provide this functionality for Python users? Specific issue that we have is the plugging of language model into CTC based speech decoder. Language model can possibly be a pre-trained TensorFlow sub-graph, capable of outputting probabilities for beam score adjustment. But we need a way to inject this into beam scorer. There's currently no API for Python to use language model with a custom scorer.

Creating ARPA language model file with 50,000 words

阅读更多关于 Creating ARPA language model file with 50,000 words

问题 I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words? 回答1: I thought I'd answer this one since it has a few votes, although based on Christina's other questions I don't think this will be a usable answer for her since a 50,000-word language model almost certainly won't have an acceptable word error rate or

TensorFlow Embedding Lookup

阅读更多关于 TensorFlow Embedding Lookup

I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py , when it did not make sense any more. what I though tf

Creating ARPA language model file with 50,000 words

阅读更多关于 Creating ARPA language model file with 50,000 words

I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words? I thought I'd answer this one since it has a few votes, although based on Christina's other questions I don't think this will be a usable answer for her since a 50,000-word language model almost certainly won't have an acceptable word error rate or recognition speed (or most likely even function for long) with in-app recognition systems for iOS that use this

Check perplexity of a Language Model

阅读更多关于 Check perplexity of a Language Model

I created a language model with Keras LSTM and now I want to assess wether it's good so I want to calculate perplexity. What is the best way to calc perplexity of a model in Python? I've come up with two versions and attached their corresponding source, please feel free to check the links out. def perplexity_raw(y_true, y_pred): """ The perplexity metric. Why isn't this part of Keras yet?! https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow https://github.com/keras-team/keras/issues/8267 """ # cross_entropy = K.sparse_categorical_crossentropy(y_true, y

do searching in a very big ARPA file in a very short time in java

阅读更多关于 do searching in a very big ARPA file in a very short time in java

I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format. File Format \data\ ngram 1=19 ngram 2=234 ngram 3=1013 \1-grams: -1.7132 puluh -3.8008 -1.9782 satu -3.8368 \2-grams: -1.5403 dalam dua -1.0560 -3.1626 dalam ini 0.0000 \3-grams: -1.8726 itu dan tiga