speech-recognition | 易学教程

Creating ARPA language model file with 50,000 words

阅读更多关于 Creating ARPA language model file with 50,000 words

I want to create an ARPA language model file with nearly 50,000 words. I can't generate the language model by passing my text file to the CMU Language Tool. Is any other link available where I can get a language model for these many words? I thought I'd answer this one since it has a few votes, although based on Christina's other questions I don't think this will be a usable answer for her since a 50,000-word language model almost certainly won't have an acceptable word error rate or recognition speed (or most likely even function for long) with in-app recognition systems for iOS that use this

How To Use Amazon Skill Set Without Amazon Echo Device

阅读更多关于 How To Use Amazon Skill Set Without Amazon Echo Device

I am trying to integrate amazon skill kit in my website without an amazon echo unit. I want to implement voice commands on my website using the laptop/PC microphone instead of an echo unit. I have used this tutorial but I didn't find anything about how to implement it on my side. I also tried these samples available on github. But I think these also require an Amazon echo device: https://github.com/amzn/alexa-skills-kit-js I am using Windows with the development environment given below My development environment I am able to configure web server for Alexa skills and it is working Asp.Net C#

How do I search content, within audio files/streams? [closed]

阅读更多关于 How do I search content, within audio files/streams? [closed]

I have always wondered how many different search techniques existed, for searching text, for searching images and even for videos. However, I have never come across a solution that searched for content within audio files. For example: Let us assume that I have about 200 podcasts downloaded to my PC in the form of mp3, wav and ogg files. They are all named generically say podcast1.mp3, podcast2.mp3, etc. So, it is not possible to know what the content is, without actually hearing them. Lets say that, I am interested in finding out, which the podcasts talk about 'game programming'. I want the

How Shazam or Sound Hound works? [closed]

阅读更多关于 How Shazam or Sound Hound works? [closed]

I'm developing an iOS application with SDK for iOS 5.0 and XCode 4.2 . I want to develop an application that recognize sounds. I see there is an application called Sound Hound that recognize music and tells artist and title. How can I do something similar? I want to compare a sound to an existing sound database. How can I do that? Maybe I can use Fourier Transform . I don't know how to process sounds. Or it could be similar to speech recognition, isn't it? I came across a paper which explains how audio search algorithms work. Here is the link . It was written by one of the developers of Shazam

Any OpenCV-like C/C++ library for Audio processing? [closed]

阅读更多关于 Any OpenCV-like C/C++ library for Audio processing? [closed]

Is there anything more out there, that resembles (in spirit) OpenCV, but for processing audio and deriving some intelligence from it ? Capabilities could range from:- Multiplatform audio capture and audio playback DSP - Audio filters Tone detection Tonal property analysis Tone synthesis (various standard waveforms) Recognition given some recognition corpus and model (e.g. determine musical instruments, beats, human speech etc.) -- could potentially use other open-source projects for the actual recognition part (sphinx) Speech / music synthesis -- could be again using some other open-source

Speech to text Conversion.?

阅读更多关于 Speech to text Conversion.?

For My Iphone Application I need a speech to text library. Can any one suggest me a solution. After two days digging what i found is Google speech to text API and open source OpenEars Library. Can any one suggest one of these.?Which one is better.? Michael Levy I don't think the Google APIs are intended for public use. They are services hosted by Google for Android and Chrome. People have reversed engineered the API and built some libraries to let people use it, but I wouldn't build a commercial application that relied on it (unless of course it was an Android or Chrome application). For

Why am I missing the an4-1-1.match file in this speech recognition code?

阅读更多关于 Why am I missing the an4-1-1.match file in this speech recognition code?

I'm having problems in the decoding part of speech recognition. I followed the steps here . When I type: perl scripts_pl/decode/slave.pl , I get these errors: MODULE: DECODE Decoding using models previously trained Decoding 130 segments starting at 0 (part 1 of 1) Could not find executable for /home/go/Documents/tutorial/an4/bin/sphinx3_decode at /home/go/Documents/tutorial/an4/scripts_pl/decode/../lib/SphinxTrain/Util.pm line 299. Aligning results to find error rate Can't open /home/go/Documents/tutorial/an4/result/an4-1-1.match word_align.pl failed with error code 65280 at scripts_pl/decode

Speech to text conversion php,javascript or flash online

阅读更多关于 Speech to text conversion php,javascript or flash online

I know php well and I use javascript and jquery but I don't seem to know how to make a speech to text conversion with them though, but i do know that there are many flash speech recognition api's around but I would like a faster, I would like a script for this that can accurately use your voice and convert it into text. Thank you very Much, Anonymous. If your goal is to do speech recognition from an html page, you might want to look at some other alternatives. Chrome supports speech recognition for text input. See http://slides.html5rocks.com/#speech-input and http://www.filosophy.org/2011/03

How to track rate of speech

阅读更多关于 How to track rate of speech

I am developing an iPhone app that tracks rate of speech, and hoping to use Nuance Speechkit ( https://developer.nuance.com/public/Help/DragonMobileSDKReference_iOS/SpeechKit_Guide/Basics.html ) Is there a way to track rate of speech (e.g., updating WPM every few seconds) with the framework? Right now it seems to just do speech-to-text at the end of a long utterance, as opposed to every word or so (i.e., return partial results). There are easier ways, for example you can use CMUSphinx with phonetic recognizer to recognize just phonemes instead of words. It would work locally on the device and

SpeechSynthesizer - How do I play/save the wav file?

阅读更多关于 SpeechSynthesizer - How do I play/save the wav file?

I have the following code snippet in an ASP.NET app (non Silverlight) string sText = "Test text"; SpeechSynthesizer ss = new SpeechSynthesizer(); MemoryStream ms = new MemoryStream(); ss.SetOutputToWaveStream(ms); ss.Speak(sText); //Need to send the ms Memory stream to the user for listening/downloadin How do I: Play this file on the browser Prompt for the user to download a wav file? Can anyone help with completing the code? EDIT: Any help is appreciated. Here's the main bit to an IHttpHandler that does what you want. Plug the handler URL into a bgsound tag or pipe it to whatever to play in