phonetics

Phonetic search for Indian languages

丶灬走出姿态 提交于 2020-12-02 03:18:34
问题 I want to compare strings phonetically in my android app. But the special case here is, I want to compare Indian language words written in English. For example, I want to check if "Edhu" "Adhu" "Yethu" are phonetically equal, they all mean the same in Tamil language. But people who use English script to write Indian languages use different spellings to make the word. How do I compare words in this case? I tried out Levenshtein. But I am not sure how to convert the number it returns to the

Phonetic search for Indian languages

扶醉桌前 提交于 2020-12-02 03:18:03
问题 I want to compare strings phonetically in my android app. But the special case here is, I want to compare Indian language words written in English. For example, I want to check if "Edhu" "Adhu" "Yethu" are phonetically equal, they all mean the same in Tamil language. But people who use English script to write Indian languages use different spellings to make the word. How do I compare words in this case? I tried out Levenshtein. But I am not sure how to convert the number it returns to the

lucene.net phonetic filter

蓝咒 提交于 2020-01-15 04:32:26
问题 I am trying to store text data to lucene. The search should be with phonetic! Where should I add a phonetic filter? Lucene.Net.Store.Directory dir = FSDirectory.Open(new DirectoryInfo(Application.StartupPath + "\\Index")); IndexReader indexReader = IndexReader.Open(dir, true); Searcher indexSearch = new IndexSearcher(indexReader); //IndexReader indexReader = IndexReader.Open(dir, true); //Searcher indexSearch = new IndexSearcher(indexReader); Analyzer analyzer = new Lucene.Net.Analysis.De

Threshold frequency is not working in spell check in Solr

半世苍凉 提交于 2019-12-24 04:27:07
问题 I am get stuck in middle of Solr . I need only most popular words w.r.t query . I have used phonetic filter on both index and query but here the problem is that it is giving too many terms . I need only few terms which are very specific to the query . schema.xml <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <filter class="solr.TrimFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <tokenizer class="solr

Where can I obtain an English dictionary with structured data? [closed]

本秂侑毒 提交于 2019-12-20 08:26:09
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL. Specifically, I need phonetic pronunciation and parts of speech (definition is not required). Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is

Where can I obtain an English dictionary with structured data? [closed]

北城以北 提交于 2019-12-20 08:26:06
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL. Specifically, I need phonetic pronunciation and parts of speech (definition is not required). Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is

Getting most likely documents of the query using phonetic filter in solr

妖精的绣舞 提交于 2019-12-18 07:15:20
问题 I am using solr for spell checking/ query correction . I have added solr.PhoneticFilterFactory and solr.NGramFilterFactory in fieldType to perform spell checking . It is working fine but here the problem is that I am getting number of documents of the query. I need only most likely words/documents or in similar words, we can say that nearer words/documents to the query . Snippet of schema.xml : <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100"> <analyzer type=

Synthesizing vowel from existing audio sample jin matlab

老子叫甜甜 提交于 2019-12-11 22:00:32
问题 I'm using matlab and have a recorded sample of a vowel sound. I'm looking to make use of my existing sample to synthesize a vowel sound at a pitch of 150Hz (lasting 5 seconds). I originally thought that I'd just have to take a sample of my existing vowel sound at the given frequency but, obviously, that doesn't actually work. So, now, I'm pretty stumped on how one would actually go about synthesizing the vowel sound? 回答1: A possible approach is: Take a single period of the sample (identified

MS SAPI sdk equivalent on OSX

偶尔善良 提交于 2019-12-11 04:25:18
问题 I'm looking for an SDK that would allow me to have speech recognition on a OSX application. I already have a working code for windows using sapi, to get speech recognition info from an audio file, and i would like to see how to do this in osx since something like SAPI is not available. Thanks! 回答1: The OS X equivalent is the Speech Recognition service: http://developer.apple.com/library/mac/#documentation/cocoa/conceptual/speech/Articles/RecognizeSpeech.html#//apple_ref/doc/uid/20002081

How to handle Combining Diacritical Marks with UnicodeUtils?

笑着哭i 提交于 2019-12-08 07:44:56
问题 I am trying to insert spaces into a string of IPA characters, e.g. to turn ɔ̃wɔ̃tɨ into ɔ̃ w ɔ̃ t ɨ . Using split/join was my first thought: s = ɔ̃w̃ɔtɨ s.split('').join(' ') #=> ̃ ɔ w ̃ ɔ p t ɨ As I discovered by examining the results, letters with diacritics are in fact encoded as two characters. After some research I found the UnicodeUtils module, and used the each_grapheme method: UnicodeUtils.each_grapheme(s) {|g| g + ' '} #=> ɔ ̃w ̃ɔ p t ɨ This worked fine, except for the inverted breve