speech

Microsoft Speech Recognition Custom Training

笑着哭i 提交于 2019-12-06 05:50:22
问题 I have been wanting to create an application using the Microsoft Speech Recognition. My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return You said: At age C. You said: Cern While it did work for 'CERN', it failed very badly for 'LHC'. However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user

Speech training files and registry locations

不打扰是莪最后的温柔 提交于 2019-12-06 02:34:22
问题 I have a speech project that requires acoustic training to be done in code. I a successfully able to create training files with transcripts and their associated registry entries under Windows 7 using SAPI. However, I am unable to determine if the Recognition Engine is successfully using these files and adapting its model. My questions are as follows: When performing training through the Control Panel training UI, the system stores the training files in "{AppData}\Local\Microsoft\Speech\Files

How to convert speech to text during call with different text colors for caller and call receiver?

笑着哭i 提交于 2019-12-06 01:51:42
问题 I want to convert speech to text during call. I also want the text to display in different colors: the call initiator's in red and the call receiver's green. During my tests, I converted speech to text during call but was unable to distinguish between the voice of the call initiator and that of the call receiver. Thanks in advance Please Help me out... 来源: https://stackoverflow.com/questions/20964359/how-to-convert-speech-to-text-during-call-with-different-text-colors-for-caller

Using c++ to call and use Windows Speech Recognition [closed]

假装没事ソ 提交于 2019-12-05 22:05:31
I am making an application that involves the use of windows speech recognition. I am thinking of using c++ to do this since i have some experience with this language. The way i want to use the speech recognition is so that it works internally. If i upload an audio file into my program, i want speech recognition to write this audio up as a text file, but all this should be done internally. Please provide some help with this and if i have not explained my question properly please let me know and i will try to explain again. Thanks in advance, Divs Michael Levy Windows provides speech recognition

Can C# SAPI speak SSML string?

ぃ、小莉子 提交于 2019-12-05 18:49:54
I implemented a TTS in my C# WPF project. Previously, I use the TTS in System.Speech.Synthesis namespace to speak. The speaking content is in SSML format (Speech Synthesizer Markup Language, support customize the speaking rate, voice, emphasize) like following: <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><prosody rate="x-fast">hello world. This is a long sentence speaking very fast!</prosody></speak> But unfortunately the System.Speech.Synthesis TTS has a memory leak problem, as I mentioned in question Memory leak in .Net Speech.Synthesizer? . So I decide

Microsoft Speech Recognition: Alternate results with confidence score?

落花浮王杯 提交于 2019-12-05 14:45:38
I'm new to working with the Microsoft.Speech recognizer (using Microsoft Speech Platform SDK Version 11) and I'm trying to have it output the n-best recognition matches from a simple grammar, along with the confidence score for each. According to the documentation (and as mentioned in the answer to this question ), one should be able to use e.Result.Alternates to access the recognized words other than the top-scoring one. However, even after resetting the confidence rejection threshold to 0 (which should mean nothing is rejected), I still only get one result, and no alternates (although the

How to split male and female voices from an audio file(in c++ or java)

元气小坏坏 提交于 2019-12-05 12:19:55
I want to differentiate betwen the male n female voices in an audio file and seperate them.As an output I want the two voices seperated.Can u please help me out n can the coding be done in java or c++ This is potentially a very complicated question, and it is similar to writing your own speech recognition (or identification) algorithm. You would start by converting the audio into the frequency domain, which is done using a Fast Fourier Transform . For each slice in time that you take an FFT, this will give you a list of frequencies and their amplitudes. You will somehow need to detect the

Matlab: Finding dominant frequencies in a frame of audio data

痞子三分冷 提交于 2019-12-05 05:21:43
问题 I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code: fr = 128; [ audio, fs, nbits ] = wavread(audioPath); spectrogram(audio,fr,120,fr,fs,'yaxis') I get a useful frequency intensity vs. time graph like this: By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection

SpeechRecognizer not Hearing After First Result

99封情书 提交于 2019-12-05 04:31:17
问题 I am using SpeechRecognizer and RecognizerIntent in Android to implement speech recognition. My aim is to restart listening to speech after my speech recognizer displays the results on the screen. For that purpose, I am using the following code. The problem is, the first time runs fine and displays the results but after it starts listening for the second time (called from onResults method), it does not hear what is being spoken for some reason. Then it gives a ERROR_SPEECH_TIMEOUT error,

Emotion detection in speech

浪子不回头ぞ 提交于 2019-12-04 16:47:29
I would like to build an app which analyses the emotional content of speech from the mic. THis does not, although sometimes used as an extra feature, involve speech recognition. Emotional analyses is based on prosodic features of the voice (pitch change, speed etc., tone). I know this can be done on a desktop computer, but i dont want users to have to upload their recordings (phone conversations) to a server in order to get emotional feedback. What i need is an API which either provides the whole analyses or an API which i can use to extract those features (i.e. the average speed of the