speech-recognition

C# SAPI 5.4 Languages?

扶醉桌前 提交于 2019-12-04 20:21:12
I've made a Simple Program That Recognizes Speech Using SAPI 5.4 , i wanted to ask if i can add some more languages to the TTS and The ASR , Thanks Here is the code i made you anybody needs to take a look at it using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using SpeechLib; using System.Globalization; using System.Speech.Recognition; namespace WindowsFormsApplication1 { public partial class Form1 : Form { // Speech Recognition Object SpSharedRecoContext

Simple speech recognition from scratch

本秂侑毒 提交于 2019-12-04 19:37:45
The most alike question I found related to my question is this ( simple speech recognition methods ) but since had passed 3 years and the answers are not enough I will ask. I want to compute, from scratch, a simple speech recognition system, I only need to recognize five words. As much as I know, the more used audio features for this application are the MFCC, and HMM for classification. I'm able to extract the MFCC from audio but I still have some doubts about how to use the features for generating a model with HMM and then perform classification. As I understand, I have to perform vector

RecognitionListener in JellyBean Freezes if not spoken to immediately

旧街凉风 提交于 2019-12-04 19:33:33
问题 A speech-recognition based app I am working on works well on all versions of Android starting from API 8 (Android 2.2). But on a Nexus S 4G (Android 4.1.1), RecognitionListener will simply halt for about 1 minute , then issue an ERROR_SERVER via its onError() callback. If spoken to within 1-2 seconds (of that onReadyForSpeech bleep), it will behave properly as expected. What changed in JellyBean that could explain this behavior? More importantly, is there a way to make it behave like in the

Does RecognitionListener.onError() automatically SpeechRecognizer.cancel()?

房东的猫 提交于 2019-12-04 17:48:03
问题 For various reasons, I need to use the raw SpeechRecognizer API instead of the easier RecognizerIntent (RECOGNIZE_SPEECH) activity. That means, among other things, that I need to handle RecognitionListener.onError() myself. In response to some of the errors, I simply want to re-start listening. This looks straightforward but when I just call SpeechRecognizer.startListening() upon error, this sometimes seems to trigger two different errors: ERROR/ServerConnectorImpl(619): Previous session not

How to split a speech to word

这一生的挚爱 提交于 2019-12-04 16:56:06
I'm play with speech recognition. Is it possible to split speech to multiple words? If it's possible please recommend me library supported split a speech to words. Thanks If you know what the speaker has said you can perform forced alignment to generate the word (or phoneme) time alignments. Toolkits such as CMU Sphinx , HTK and Kaldi can perform this. If don't know what the speaker has said you can just perform standard speech recognition and use the time information to obtain the word boundaries, although there may be errors in the recognition output. akademi4eg Having no prior information

Speech to text for single word

假如想象 提交于 2019-12-04 16:51:47
I want to create a automatic speech recognition system that will identify a correct word from a list of words in the database. I have seen CMUSphinx can be used for this problem. I have tried the hello world sphinx demo app, but it gives not expected results. I don't know how to choose the correct acoustic model, dictionary file, language model. For a single word is the language model necessary? Is there any prebuilt acoustic model for Indian English? I have tried the hello world sphinx demo app, but it gives not expected results. You need to provide more details on what have your tried.

Emotion detection in speech

浪子不回头ぞ 提交于 2019-12-04 16:47:29
I would like to build an app which analyses the emotional content of speech from the mic. THis does not, although sometimes used as an extra feature, involve speech recognition. Emotional analyses is based on prosodic features of the voice (pitch change, speed etc., tone). I know this can be done on a desktop computer, but i dont want users to have to upload their recordings (phone conversations) to a server in order to get emotional feedback. What i need is an API which either provides the whole analyses or an API which i can use to extract those features (i.e. the average speed of the

Speech Recognition with SAPI: Custom Language Support through phonemes

自古美人都是妖i 提交于 2019-12-04 16:13:39
I have a text that I have transcribed from text to phonemes. I want now to modify or create a custom grammar XML which will define the pronounciation of the words with international phonemes and use that grammer with that specific spelling to be recognized instead of anything else. I want to add speech recognition for certain words spoken in different languages than english/german etc; Would that be possible with SAPI and how? can anyone point me in the right direction (using SpInProcRecoContext.Recognizer and custom grammar) So I want to use the already existing recognition engine for e.a.

Partial results using speech recognition

天大地大妈咪最大 提交于 2019-12-04 15:29:01
I created a simple application inspired by this example in order to test all the available options (ie extra). I read about the EXTRA_PARTIAL_RESULTS extra and if I enable this option I should receive from the server any partial results related to a speech recognition. However, when I add this extra to the ACTION_RECOGNIZE_SPEECH intent, the voice recognition does not work anymore: the list does not display any results. protected void onActivityResult(int requestCode, int resultCode, Intent data) { if (requestCode == VOICE_RECOGNITION_REQUEST_CODE) { switch(resultCode) { case RESULT_OK: Log.i

3-state phone model in Hidden Markov Model (HMM)

99封情书 提交于 2019-12-04 15:18:29
I want to ask regarding the meaning of 3-state phone model in HMM. This case is based on the theory of HMM in speech recognition system. So the example is based on the acoustic modeling of the speech sounds in HMM. I get this example picture from a journal paper: http://www.intechopen.com/source/html/41188/media/image8_w.jpg Figure 1: 3-State HMM for the sound /s/ So, my question is: what is it mean by 3 state? what actually S1, S2 & S3 mean? (I know it is state but it represent what?) How to represent the /s/ sound in this HMM state? Why is it 3? what happen if we have 4, 5 or more state? If