speech-recognition | 易学教程

Difference in word confidence in IBM Watson Speech to text

阅读更多关于 Difference in word confidence in IBM Watson Speech to text

问题 I am using the node sdk to use the IBM watson speech-to-text module. After sending the audio sample and receiving a response, the confidence factor looks weird. { "results": [ { "word_alternatives": [ { "start_time": 3.31, "alternatives": [ { "confidence": 0.7563, "word": "you" }, { "confidence": 0.0254, "word": "look" }, { "confidence": 0.0142, "word": "Lou" }, { "confidence": 0.0118, "word": "we" } ], "end_time": 3.43 }, ... and ... ], "alternatives": [ { "word_confidence": [ [ "you", 0

Are MFCC features required for speech recognition

阅读更多关于 Are MFCC features required for speech recognition

问题 I'm currently developing a speech recognition project and I'm trying to select the most meaningful features. Most of the relevant papers suggest using Zero Crossing Rates, F0, and MFCC features therefore I'm using those. My question is, a training sample with duration of 00:03 has 268 features. Considering I'm doing a multi class classification project with 50+ samples per class training including all MFCC features may suffer the project from curse of dimensionality or 'reduce the importance'

Speech recognition with punctuation

阅读更多关于 Speech recognition with punctuation

问题 After doing some tests on the speech framework, I've realized that there was no punctuation in the result. Is there a way to trigger it? I've seen that Siri does recognize punctuation so I think it should be doable. 回答1: Usually punctuation is assigned as a separate post-processing step. For English you can use Punctuator, for other languages you have to build models for post-processing, it would be a bit more complicated. 来源： https://stackoverflow.com/questions/42728618/speech-recognition

Pocketsphinx + Gstreamer Race Condition? Pocketsphinx can't listen to audio + record from it at the same time in Python script?

阅读更多关于 Pocketsphinx + Gstreamer Race Condition? Pocketsphinx can't listen to audio + record from it at the same time in Python script?

问题 Overview: So this is a follow up to my last problem (here). I will be posting a full answer on that very soon. I'm able to get pocketsphinx to recognize audio input from my PS3 Eye in Python via Gstreamer. By specifying the correct alsa device ( hw:1 in my case ). ISSUE: My next issue seems to involve a tiny race condition involving my Microphone already being in use and needing to be able to record something. Imagine the following: I start up my python daemon, and it's currently listening. I

How to setup tresholds to spot keywords from a list in pocketsphinx-android?

阅读更多关于 How to setup tresholds to spot keywords from a list in pocketsphinx-android?

问题 I would like my Android application to do continuous keywords spotting. I'm modifying the pocketsphinx android demo to test how I can do it. I wrote this list in a file named en-keywords.txt picking words from cmudict-en-us.dict : rainbow /1e-50/ about /1e-50/ blood /1e-50/ energies /1e-50/ In setupRecognizer method I removed every search and added to the recognizer only this keyword search: File keywords= new File(assetsDir, "en-keywords.txt"); recognizer.addKeywordSearch(KWS_SEARCH,

Speech Recognition: “Network error” on startListening() Android 6/7

阅读更多关于 Speech Recognition: “Network error” on startListening() Android 6/7

问题 When I run the code below, as soon as the beep sounds to start recording the mic, I get "Network error": I have no idea what's wrong here. I've: added the: cordova plugin + npm module correctly, granted permissions in the app for microphone tried connected to WIfi tried connected to 4g tried removing and re-adding the android platform to the project tried on 2 different phones (samsung s5 android 6 / sony xperia z5c android 7) Here's my basic code, nothing special here: setupSpeechRecognition

MonoTouch iOS FLAC File Speech Recognition

阅读更多关于 MonoTouch iOS FLAC File Speech Recognition

问题 I'm programming an App for iOS which should recognise the human user speech with MonoTouch. I need to use the microphone and convert the Voice-File into FLAC format, in order to send it to the Google Speech API. Are there any Libraries/Code examples, or is it possible to build a Speech-Recognition App for iOS using MonoTouch and for example the Google Speech API ? Or is it possible to convert the audio Output(a .caf file) to a .flac file? 来源： https://stackoverflow.com/questions/19119758

SpeechSynthesizer doesn't get all installed voices 2

阅读更多关于 SpeechSynthesizer doesn't get all installed voices 2

问题 I have installed a new voice in my Windows 7 32 bits OS in order to be able to use from in one .NET application i'm developing. But when i use GetInstalledVoices() method to view list of all voices, only one (default "Microsoft Anna") appers. Why it may happens? Voice is appears in Control Panel -> Speech section. Other TTS applications also can use this voice. Thanks. 回答1: I found the answer for my question here This is a bug in System.Speech.Synthesis and .NET v4.5 solves this problem. 来源：

MS SAPI sdk equivalent on OSX

阅读更多关于 MS SAPI sdk equivalent on OSX

问题 I'm looking for an SDK that would allow me to have speech recognition on a OSX application. I already have a working code for windows using sapi, to get speech recognition info from an audio file, and i would like to see how to do this in osx since something like SAPI is not available. Thanks! 回答1: The OS X equivalent is the Speech Recognition service: http://developer.apple.com/library/mac/#documentation/cocoa/conceptual/speech/Articles/RecognizeSpeech.html#//apple_ref/doc/uid/20002081

Sphinx4 speech recognition trasncribe demo not working accurately for short wav file

阅读更多关于 Sphinx4 speech recognition trasncribe demo not working accurately for short wav file

问题 i had just implemetned the transcriber demo for the transcribe the audio file .. My audio file is .wav file which consist only names like "BHAVIK" "ANKIT" "SAGAR" My grammer File Consist of this grammer as follows: - public = (JAY)|(SAGAR)|(BHAVIK)|(ANKIT)|(MIRAJ)|(YAGNESH); But Problem is that the transcriber demo not provide the correct result ..its just providing me somthing other when i give .wav file of "JAY" ..its not give the correct result.. Why this happening? my .wav file is here u