language-model | 易学教程

do searching in a very big ARPA file in a very short time in java

阅读更多关于 do searching in a very big ARPA file in a very short time in java

问题 I have an ARPA file which is almost 1 GB. I have to do searching in it in less than 1 minute. I have searched a lot, but I have not found the suitable answer yet. I think I do not have to read the whole file. I just have to jump to a specific line in the file and read the whole line. The lines of the ARPA file do not have the same length. I have to mention that ARPA files have a specific format. File Format \data\ ngram 1=19 ngram 2=234 ngram 3=1013 \1-grams: -1.7132 puluh -3.8008 -1.9782

NLTK package to estimate the (unigram) perplexity

阅读更多关于 NLTK package to estimate the (unigram) perplexity

问题 I am trying to calculate the perplexity for the data I have. The code I am using is: import sys sys.path.append("/usr/local/anaconda/lib/python2.7/site-packages/nltk") from nltk.corpus import brown from nltk.model import NgramModel from nltk.probability import LidstoneProbDist, WittenBellProbDist estimator = lambda fdist, bins: LidstoneProbDist(fdist, 0.2) lm = NgramModel(3, brown.words(categories='news'), True, False, estimator) print lm But I am receiving the error, File "/usr/local

How to compute skipgrams in python?

阅读更多关于 How to compute skipgrams in python?

A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python? Following is the code i tried but it is not doing as expected: <pre> input_list = ['all', 'this', 'happened', 'more', 'or', 'less'] def find_skipgrams(input_list, N,K): bigram_list = [] nlist=[] K=1 for k in range(K+1): for i in range(len(input_list)-1): if i+k+1<len(input_list): nlist=[] for j in range(N+1): if i+k+j+1<len(input_list): nlist.append(input_list[i+k+j+1]) bigram_list.append(nlist) return bigram

Building openears compatible language model

阅读更多关于 Building openears compatible language model

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful. The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a big English language model to feed the API speech recognizer engine. But I failed to understand the format of the voxfourge english data model to use with OpenEars. Do anyone have any idea that how can I get the .languagemodel and .dic file for English language to work with OpenEars? Halle Old question, but maybe the answer is still interesting.

How to compute skipgrams in python?

阅读更多关于 How to compute skipgrams in python?

问题 A k skipgram is an ngram which is a superset of all ngrams and each (k-i )skipgram till (k-i)==0 (which includes 0 skip grams). So how to efficiently compute these skipgrams in python? Following is the code i tried but it is not doing as expected: <pre> input_list = ['all', 'this', 'happened', 'more', 'or', 'less'] def find_skipgrams(input_list, N,K): bigram_list = [] nlist=[] K=1 for k in range(K+1): for i in range(len(input_list)-1): if i+k+1<len(input_list): nlist=[] for j in range(N+1):

Building openears compatible language model

阅读更多关于 Building openears compatible language model

问题 I am doing some development on speech to text and text to speech and I found the OpenEars API very useful. The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a big English language model to feed the API speech recognizer engine. But I failed to understand the format of the voxfourge english data model to use with OpenEars. Do anyone have any idea that how can I get the .languagemodel and .dic file for