language-model

Check perplexity of a Language Model

家住魔仙堡 提交于 2019-12-20 06:17:02
问题 I created a language model with Keras LSTM and now I want to assess wether it's good so I want to calculate perplexity. What is the best way to calc perplexity of a model in Python? 回答1: I've come up with two versions and attached their corresponding source, please feel free to check the links out. def perplexity_raw(y_true, y_pred): """ The perplexity metric. Why isn't this part of Keras yet?! https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow https

TensorFlow: loss jumps up after restoring RNN net

给你一囗甜甜゛ 提交于 2019-12-19 10:52:46
问题 Environment info Operating System: Windows 7 64-bit Tensorflow installed from pre-built pip (no CUDA): 1.0.1 Python 3.5.2 64-bit Problem I have problems with restoring my net (RNN character base language model). Below is a simplified version with the same problem. When I run it the first time, I get, for example, this. ... step 160: loss = 1.956 (perplexity = 7.069016620211226) step 180: loss = 1.837 (perplexity = 6.274748642468816) step 200: loss = 1.825 (perplexity = 6.202084762557817) But

TensorFlow: loss jumps up after restoring RNN net

﹥>﹥吖頭↗ 提交于 2019-12-19 10:52:08
问题 Environment info Operating System: Windows 7 64-bit Tensorflow installed from pre-built pip (no CUDA): 1.0.1 Python 3.5.2 64-bit Problem I have problems with restoring my net (RNN character base language model). Below is a simplified version with the same problem. When I run it the first time, I get, for example, this. ... step 160: loss = 1.956 (perplexity = 7.069016620211226) step 180: loss = 1.837 (perplexity = 6.274748642468816) step 200: loss = 1.825 (perplexity = 6.202084762557817) But

Using custom beam scorer in TensorFlow CTC (language model)

ぐ巨炮叔叔 提交于 2019-12-04 22:09:04
问题 Is it possible to customize beam scorer in TensorFlow CTC implementation from Python side? I see this possibility in comment for CTCBeamSearchDecoder C++ class constructor but wonder how to provide this functionality for Python users? Specific issue that we have is the plugging of language model into CTC based speech decoder. Language model can possibly be a pre-trained TensorFlow sub-graph, capable of outputting probabilities for beam score adjustment. But we need a way to inject this into

Using custom beam scorer in TensorFlow CTC (language model)

回眸只為那壹抹淺笑 提交于 2019-12-03 14:57:31
Is it possible to customize beam scorer in TensorFlow CTC implementation from Python side? I see this possibility in comment for CTCBeamSearchDecoder C++ class constructor but wonder how to provide this functionality for Python users? Specific issue that we have is the plugging of language model into CTC based speech decoder. Language model can possibly be a pre-trained TensorFlow sub-graph, capable of outputting probabilities for beam score adjustment. But we need a way to inject this into beam scorer. There's currently no API for Python to use language model with a custom scorer.

Creating ARPA language model file with 50,000 words

强颜欢笑 提交于 2019-12-03 03:54:52
问题 I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words? 回答1: I thought I'd answer this one since it has a few votes, although based on Christina's other questions I don't think this will be a usable answer for her since a 50,000-word language model almost certainly won't have an acceptable word error rate or

TensorFlow Embedding Lookup

微笑、不失礼 提交于 2019-12-02 17:20:13
I am trying to learn how to build RNN for Speech Recognition using TensorFlow. As a start, I wanted to try out some example models put up on TensorFlow page TF-RNN As per what was advised, I had taken some time to understand how word IDs are embedded into a dense representation (Vector Representation) by working through the basic version of word2vec model code. I had an understanding of what tf.nn.embedding_lookup actually does, until I actually encountered the same function being used with two dimensional array in TF-RNN ptb_word_lm.py , when it did not make sense any more. what I though tf

Creating ARPA language model file with 50,000 words

混江龙づ霸主 提交于 2019-12-02 17:18:44
I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words? I thought I'd answer this one since it has a few votes, although based on Christina's other questions I don't think this will be a usable answer for her since a 50,000-word language model almost certainly won't have an acceptable word error rate or recognition speed (or most likely even function for long) with in-app recognition systems for iOS that use this

Check perplexity of a Language Model

一个人想着一个人 提交于 2019-12-02 08:29:56
I created a language model with Keras LSTM and now I want to assess wether it's good so I want to calculate perplexity. What is the best way to calc perplexity of a model in Python? I've come up with two versions and attached their corresponding source, please feel free to check the links out. def perplexity_raw(y_true, y_pred): """ The perplexity metric. Why isn't this part of Keras yet?! https://stackoverflow.com/questions/41881308/how-to-calculate-perplexity-of-rnn-in-tensorflow https://github.com/keras-team/keras/issues/8267 """ # cross_entropy = K.sparse_categorical_crossentropy(y_true, y

do searching in a very big ARPA file in a very short time in java

故事扮演 提交于 2019-12-02 02:37:56
I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format. File Format \data\ ngram 1=19 ngram 2=234 ngram 3=1013 \1-grams: -1.7132 puluh -3.8008 -1.9782 satu -3.8368 \2-grams: -1.5403 dalam dua -1.0560 -3.1626 dalam ini 0.0000 \3-grams: -1.8726 itu dan tiga