speech | 易学教程

Microsoft Speech Recognition Custom Training

阅读更多关于 Microsoft Speech Recognition Custom Training

问题 I have been wanting to create an application using the Microsoft Speech Recognition. My application's users are expected to often say abbreviated things, such as 'LHC' for 'Large Hadron Collider' or 'CERN'. Given that exact order, my application will return You said: At age C. You said: Cern While it did work for 'CERN', it failed very badly for 'LHC'. However, if I could make my own custom training files, I could easily place the term 'LHC' somewhere in there. Then, I could make the user

Speech training files and registry locations

阅读更多关于 Speech training files and registry locations

问题 I have a speech project that requires acoustic training to be done in code. I a successfully able to create training files with transcripts and their associated registry entries under Windows 7 using SAPI. However, I am unable to determine if the Recognition Engine is successfully using these files and adapting its model. My questions are as follows: When performing training through the Control Panel training UI, the system stores the training files in "{AppData}\Local\Microsoft\Speech\Files

How to convert speech to text during call with different text colors for caller and call receiver?

阅读更多关于 How to convert speech to text during call with different text colors for caller and call receiver?

问题 I want to convert speech to text during call. I also want the text to display in different colors: the call initiator's in red and the call receiver's green. During my tests, I converted speech to text during call but was unable to distinguish between the voice of the call initiator and that of the call receiver. Thanks in advance Please Help me out... 来源： https://stackoverflow.com/questions/20964359/how-to-convert-speech-to-text-during-call-with-different-text-colors-for-caller

Using c++ to call and use Windows Speech Recognition [closed]

阅读更多关于 Using c++ to call and use Windows Speech Recognition [closed]

I am making an application that involves the use of windows speech recognition. I am thinking of using c++ to do this since i have some experience with this language. The way i want to use the speech recognition is so that it works internally. If i upload an audio file into my program, i want speech recognition to write this audio up as a text file, but all this should be done internally. Please provide some help with this and if i have not explained my question properly please let me know and i will try to explain again. Thanks in advance, Divs Michael Levy Windows provides speech recognition

Can C# SAPI speak SSML string?

阅读更多关于 Can C# SAPI speak SSML string?

I implemented a TTS in my C# WPF project. Previously, I use the TTS in System.Speech.Synthesis namespace to speak. The speaking content is in SSML format (Speech Synthesizer Markup Language, support customize the speaking rate, voice, emphasize) like following: <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><prosody rate="x-fast">hello world. This is a long sentence speaking very fast!</prosody></speak> But unfortunately the System.Speech.Synthesis TTS has a memory leak problem, as I mentioned in question Memory leak in .Net Speech.Synthesizer? . So I decide

Microsoft Speech Recognition: Alternate results with confidence score?

阅读更多关于 Microsoft Speech Recognition: Alternate results with confidence score?

I'm new to working with the Microsoft.Speech recognizer (using Microsoft Speech Platform SDK Version 11) and I'm trying to have it output the n-best recognition matches from a simple grammar, along with the confidence score for each. According to the documentation (and as mentioned in the answer to this question ), one should be able to use e.Result.Alternates to access the recognized words other than the top-scoring one. However, even after resetting the confidence rejection threshold to 0 (which should mean nothing is rejected), I still only get one result, and no alternates (although the

How to split male and female voices from an audio file(in c++ or java)

阅读更多关于 How to split male and female voices from an audio file(in c++ or java)

I want to differentiate betwen the male n female voices in an audio file and seperate them.As an output I want the two voices seperated.Can u please help me out n can the coding be done in java or c++ This is potentially a very complicated question, and it is similar to writing your own speech recognition (or identification) algorithm. You would start by converting the audio into the frequency domain, which is done using a Fast Fourier Transform . For each slice in time that you take an FFT, this will give you a list of frequencies and their amplitudes. You will somehow need to detect the

Matlab: Finding dominant frequencies in a frame of audio data

阅读更多关于 Matlab: Finding dominant frequencies in a frame of audio data

问题 I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code: fr = 128; [ audio, fs, nbits ] = wavread(audioPath); spectrogram(audio,fr,120,fr,fs,'yaxis') I get a useful frequency intensity vs. time graph like this: By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection

SpeechRecognizer not Hearing After First Result

阅读更多关于 SpeechRecognizer not Hearing After First Result

问题 I am using SpeechRecognizer and RecognizerIntent in Android to implement speech recognition. My aim is to restart listening to speech after my speech recognizer displays the results on the screen. For that purpose, I am using the following code. The problem is, the first time runs fine and displays the results but after it starts listening for the second time (called from onResults method), it does not hear what is being spoken for some reason. Then it gives a ERROR_SPEECH_TIMEOUT error,

Emotion detection in speech

阅读更多关于 Emotion detection in speech

I would like to build an app which analyses the emotional content of speech from the mic. THis does not, although sometimes used as an extra feature, involve speech recognition. Emotional analyses is based on prosodic features of the voice (pitch change, speed etc., tone). I know this can be done on a desktop computer, but i dont want users to have to upload their recordings (phone conversations) to a server in order to get emotional feedback. What i need is an API which either provides the whole analyses or an API which i can use to extract those features (i.e. the average speed of the