speech-recognition

Open Source Software For Transcribing Speech in Audio Files

只愿长相守 提交于 2019-12-02 20:36:26
Can anyone recommend reliable open source software for transcribing English speech in wav files? The two main programs I've researched are Sphinx and Julius , but I've never been able to get either to work, and the documentation with each on transcribing files is sketchy at best. I'm developing on 64-bit Ubuntu 10.04, whose repos include sphinx2 and julius, as well as voxforge's julius acoustic modal for English. I'm focussing on transcribing files, instead of directly processing sound from a mic, because I've given up on expecting projects like these to work with Ubuntu's sound system. This

Integrate Google Voice Recognition in Android app

断了今生、忘了曾经 提交于 2019-12-02 20:34:24
I want to introduce a new feature into my app: permanent voice recognition . First of all I followed these posts: Voice recognition Speech recognition in Android Offline Speech Recognition In Android (JellyBean) and more others, plus other posts from different websites. Problem: What actually I'm trying to do is to have a permanent voice recognition without displaying google's voice activity. For example: When I start the application the voice recognition should start and listen. When the recognizer matches some words then my app will do different actions accordingly. I do not like to press a

Speech to text conversion php,javascript or flash online

橙三吉。 提交于 2019-12-02 19:03:27
问题 I know php well and I use javascript and jquery but I don't seem to know how to make a speech to text conversion with them though, but i do know that there are many flash speech recognition api's around but I would like a faster, I would like a script for this that can accurately use your voice and convert it into text. Thank you very Much, Anonymous. 回答1: If your goal is to do speech recognition from an html page, you might want to look at some other alternatives. Chrome supports speech

SAPI Symbol Usage for Speech Dictionary Input

风格不统一 提交于 2019-12-02 18:58:14
问题 I've been doing some work to add words and pronunciations to the Windows speech dictionary via the SpLexicon Interface of SAPI 5.4 (which I think is the only way to do it) via the AddPronunciation function, or in my case: // Initialize SpLexicon instance SpLexicon lex = new SpLexicon(); // Specify the word to add to the speech dictionary string myWord = "father"; // Set the language ID (US English) int langid = new System.Globalization.CultureInfo("en-US").LCID; // Specify the word's part of

Synchronizing text and audio. Is there a NLP/speech-to-text library to do this?

佐手、 提交于 2019-12-02 18:26:18
I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language. Desired, but not required: Open Source Compatible with American English out-of-the-box Cross-platform Thoroughly documented Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance. What I've found so far: OpenEars (iOS Sphinx/Flite

How to track rate of speech

前提是你 提交于 2019-12-02 18:13:47
问题 I am developing an iPhone app that tracks rate of speech, and hoping to use Nuance Speechkit (https://developer.nuance.com/public/Help/DragonMobileSDKReference_iOS/SpeechKit_Guide/Basics.html) Is there a way to track rate of speech (e.g., updating WPM every few seconds) with the framework? Right now it seems to just do speech-to-text at the end of a long utterance, as opposed to every word or so (i.e., return partial results). 回答1: There are easier ways, for example you can use CMUSphinx with

Why isn't speech recognition advancing? [closed]

南笙酒味 提交于 2019-12-02 17:42:55
What's so difficult about the subject that algorithm designers are having a hard time tackling it? Is it really that complex? I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case? Because if people find it hard to understand other people with a strong accent why do you think computers will be any better at it? Auditory processing is a very complex task. Human evolution has produced a system so good that we don't realize how good it is. If three persons are talking to you at the same time you will be able to focus in one

Open source code for voice detection and discrimination

99封情书 提交于 2019-12-02 17:33:48
I have 15 audio tapes, one of which I believe contains an old recording of my grandmother and myself talking. A quick attempt to find the right place didn't turn it up. I don't want to listen to 20 hours of tape to find it. The location may not be at the start of one of the tapes. Most of the content seems to fall into three categories -- in order of total length, longest first: silence, speech radio, and music. I plan to convert all of the tapes to digital format, and then look again for the recording. The obvious way is to play them all in the background while I'm doing other things. That's

Using System.Speech to convert mp3 file to text

时光总嘲笑我的痴心妄想 提交于 2019-12-02 17:20:53
I'm trying to use the speech recognition in .net to recognize the speech of a podcast in an mp3 file and get the result as string. All the examples I've seen are related to using microphone but I don't want to use the microphone and provide a sample mp3 file as my audio source. Can anyone point me to any resource or post an example. EDIT - I converted the audio file to wav file and tried this code on it. But it only extracts the first 68 words. public class MyRecognizer { public string ReadAudio() { SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); Grammar gr = new DictationGrammar(

Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

牧云@^-^@ 提交于 2019-12-02 17:19:31
I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success. First of all, anyone know where can I read a good step-by-step tutorial? Tensorflow's documentation is very poor on this topic. Do I have to provide to ctc_loss the labels with the blank label interleaved or not? I could not be able to overfit my network even using a train dataset of length 1 over 200 epochs. :( How can I calculate the label error rate using tf.edit_distance? Here is my code: with graph.as_default(): max_length = X_train.shape[1] frame_size = X_train.shape[2]