speech-recognition | 易学教程

SFTranscriptionSegment's timestamp is always 0

阅读更多关于 SFTranscriptionSegment's timestamp is always 0

问题 This is a questoin regarding new iOS 10 Speech framework. I get the speech recognition result using following method recognitionTask = [speechRecgzr recognitionTaskWithRequest:recognitionRequest resultHandler:^(SFSpeechRecognitionResult * _Nullable result, NSError * _Nullable error) { } But the timestamp of each SFTranscriptionSegment in result is 0 and also confidence is always 0 What can be the problem here? Have apple not implemented the API properly yet? Thank you. 回答1: After few weeks I

Convert audio files for CMU Sphinx 4 input

阅读更多关于 Convert audio files for CMU Sphinx 4 input

问题 I have a big batch of files I'd like to run recognition on using CMU Sphinx 4. Sphinx requires the following format: 16 khz 16 bit mono little-endian My files are something like 44100 khz, 32 bit stereo mp3 files. I tried using Tritonus, and then its updated version JavaZoom, to convert using code from bakuzen. However, AudioSystem.getAudioInputStream(File) throws an UnsupportedAudioFileException , and I haven't been able to figure out why, so I have moved on. Now I am trying ffmpeg. The

C# grammar and switch wildcard

阅读更多关于 C# grammar and switch wildcard

问题 I would like to add, that whenever it recognizes 'search X' it is going to search for 'X', but i don't know how i have to add that to the grammar, or how to do such a thing with my switch statement. private void Form1_Load(object sender, EventArgs e) { Choices commands = new Choices(); commands.Add(new string[] { "hello", "start chrome", "search" }); GrammarBuilder gBuilder = new GrammarBuilder(); gBuilder.Append(commands); gBuilder.Culture = new System.Globalization.CultureInfo("en-GB");

UWP speech recognition failure requires restart with foreground and timeout

阅读更多关于 UWP speech recognition failure requires restart with foreground and timeout

问题 I'm trying to use continuous speech recognition in UWP application, but each time after number of successful results in very different random moments during processing ContinuousRecognitionSession_ResultGenerated just stops receive recognition event: SpeechRecognizer contSpeechRecognizer = new SpeechRecognizer(); private CoreDispatcher dispatcher; protected async override void OnNavigatedTo(NavigationEventArgs e) { bool permissionGained = await AudioCapturePermissions

Managing text-to-speech and speech recognition at same time in iOS

阅读更多关于 Managing text-to-speech and speech recognition at same time in iOS

问题 I'd like my iOS app to use text-to-speech to read to the user some information that it receives from a server, and I'd also like to allow the user to stop such speech by a voice command. I have tried speech recognition frameworks for iOS like OpenEars and I find the problem that it is listening and detecting the information the app itself is "saying" and it intereferes in the recognition of user's voice commands. Has somebody dealt with this scenario in iOS and found a solution for that?

How to perform DTW on an array of MFCC coefficients?

阅读更多关于 How to perform DTW on an array of MFCC coefficients?

问题 Currently I'm working on speech recognition project in MATLAB. I've taken two voice signals and have extracted the MFCC coefficients of the same. As far as I know, I should now calculate the Euclidean distance between the two and then apply the DTW algorithm. That's why I calculated the distnace between the two and got an array of the distances. So my question is how to implement DTW on resultant array? Here's my MATLAB code: clear all; close all; clc; % Define variables Tw = 25; % analysis

How to Auto stop speech recognition if user stop speaking

阅读更多关于 How to Auto stop speech recognition if user stop speaking

问题 I am working on Bot app and here I have 2 features Speech to Text Text to Speech Both are working as expected but I want to detect that when user stop speaking at that time I want to stop detection and send that data to server. Is there any way to get that user is not speaking ? I am using below code for speech detection : // Starts an AVAudio Session NSError *error; AVAudioSession *audioSession = [AVAudioSession sharedInstance]; [audioSession setCategory:AVAudioSessionCategoryPlayAndRecord

Emotion detection in speech

阅读更多关于 Emotion detection in speech

问题 I would like to build an app which analyses the emotional content of speech from the mic. THis does not, although sometimes used as an extra feature, involve speech recognition. Emotional analyses is based on prosodic features of the voice (pitch change, speed etc., tone). I know this can be done on a desktop computer, but i dont want users to have to upload their recordings (phone conversations) to a server in order to get emotional feedback. What i need is an API which either provides the

Specifying a pronunciation of a word in Microsoft Speech API

阅读更多关于 Specifying a pronunciation of a word in Microsoft Speech API

问题 I'm working on a small application in C# which performs speech recognition using Microsoft Speech API. I need to add some non-english words to grammar, whose pronunciation don't obey english pronunciation rules. Is it possible specify their pronunciation using International Phonetic Alphabet ? If yes, which methods should be used ? 回答1: The way to achieve custom pronunciation here is by passing an SrgsDocument to the Grammar constructor. This allows specification per http://www.w3.org/TR

How to increase the amount of time to consider input complete, in android voice recognition?

阅读更多关于 How to increase the amount of time to consider input complete, in android voice recognition?

问题 In android voice recognition, Can any one know how to increase the amount of time that it should take after we stop hearing speech to consider the input possibly complete. I need to prevent the endpointer cutting off during very short mid-speech pauses while voice recognition. If anyone knows the solution, please give reply. Any response would be appreciated. thanks in advance 回答1: These two parameters are relevant and they control the amount of silence the recognizer needs to hear before