speech-recognition | 易学教程

Speech to text from wav file

阅读更多关于 Speech to text from wav file

问题 Merged with Speech to text from wav file Java. Is it possible to input a wav file to the Java Speech API? 来源： https://stackoverflow.com/questions/5382074/speech-to-text-from-wav-file

Webkit Speech - Javascript trigger mic listen

阅读更多关于 Webkit Speech - Javascript trigger mic listen

问题 <input x-webkit-speech /> Gives in webkit browsers: Is there a way to to trigger the mic click with javascript ? 回答1: Not at this time. You can visit http://www.webkit.org/ and leave a suggestion for them to add this in the future. 回答2: Old question but this is now possible, ish. http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API 来源： https://stackoverflow.com/questions/9360822/webkit-speech-javascript-trigger-mic-listen

Use of SpeechRecognizer produces ERROR_NETWORK (Value 2)

阅读更多关于 Use of SpeechRecognizer produces ERROR_NETWORK (Value 2)

问题 I am using the class SpeechRecognizer by calling the method startListening when a button is pressed, but I get an error. First the callback methods onReadyForSpeech, onBeginningOfSpeech, onEndofSpeech are called (immediately) and at the end onError with the errorcode 2 "ERROR_NETWORK" is called. When I call the intent directly by using startActivityForResult, it works. But I want to get rid of the time consuming popup dialog. I have the RECORD_AUDIO permission set. I am not sure, but perhaps

Google Speech Cloud error on Android: OUT_OF_RANGE: Exceeded maximum allowed stream duration of 65 seconds

阅读更多关于 Google Speech Cloud error on Android: OUT_OF_RANGE: Exceeded maximum allowed stream duration of 65 seconds

问题 First: I already know there is a 65 second limit on continuous speech recognition streaming with this API. My goal is NOT to extend those 65 seconds. My app: It uses Google's streaming Speech Recognition, I based my code on this example: https://github.com/GoogleCloudPlatform/android-docs-samples/tree/master/speech The app works fairly well, I get ASR results and show them onscreen as the user speaks, Siri style. The problem: My problem comes after tapping the ASR button on my app several,

Please build and install the PortAudio Python bindings first

阅读更多关于 Please build and install the PortAudio Python bindings first

问题 i already installed pyaudio but the problem is when i work with the microphone functions import speech_recognition as sr r = sr.Recognizer() mic = sr.Microphone() the problem is in the third line mic = sr.Microphone() the terminal will give me this message Please build and install the PortAudio Python bindings first. and if i try to install pip install PortAudio it will give me the following message Could not find a version that satisfies the requirement PortAudio (from versions: )No matching

Converting from one MFCC type to another - HTK

阅读更多关于 Converting from one MFCC type to another - HTK

问题 I am working with the HTK toolkit on a word spotting task and have a classic training and testing data mismatch. The training data consisted of only "clean" (recorded over a mic) data. The data was converted to MFCC_E_D_A parameters which were then modelled by HMMs (phone-level). My test data has been recorded over landline and mobile phone channels (inviting distortions and the like). Using the MFCC_E_D_A parameters with HVite results in incorrect output. I want to make use of cepstral mean

Muting SpeechRecognizer's beep sound

阅读更多关于 Muting SpeechRecognizer's beep sound

问题 I'm using SpeechRecognizer API for my app, and everytime it starts, it plays "beep" sound. I'd like to know how to mute it, So I could implement one of my own. Thanks. 回答1: If you are using a button to activate and deactivate the recognizer you can mute sound onclick. This doesnt work fantastically if you have it listening constantly, however for button clicks it should be fine :) private AudioManager manager; manager = (AudioManager) getSystemService(Context.AUDIO_SERVICE); if (isChecked) {

Speech recogition and intonation detection

阅读更多关于 Speech recogition and intonation detection

问题 I want to make an iOS app to count interrogative sentences. I will look for WH questions and also "will I, am I?" format questions. I am not very get in the speech or audio technology world, but I did Google and found that there are few speech recognition SDKs. But still no idea how can I detect and graph intonation. Are there any SDKs that support intonation or emotional speech recognition? 来源： https://stackoverflow.com/questions/15527107/speech-recogition-and-intonation-detection

sphinx-4 aligner skips plain words like `you`, `in` and words with dashes - why?

阅读更多关于 sphinx-4 aligner skips plain words like `you`, `in` and words with dashes - why?

问题 I'm trying to align simple text. Here are the links to text and audio files: http://s000.tinyupload.com/?file_id=48044768133759453374 http://s000.tinyupload.com/?file_id=99891199139563396901 Here is the configuration settings: private static final String ACOUSTIC_MODEL_PATH = "resource:/edu/cmu/sphinx/models/en-us/en-us"; private static final String DICTIONARY_PATH = "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict"; The output I get is the following (ellipsis are added by me): - ï -

measuring rate of speech in realtime

阅读更多关于 measuring rate of speech in realtime

问题 I'm looking for a quick and simple way to measure the rate at which I am speaking in real time. Course grained approaches or approximations are sufficient. The idea is to write a simple app/widget that at least tells you to speed up or slow down while speaking. Measuring things like pitch and volume might also be nice. I assume this can be done simply with a variety of speech recognition libraries, but I am familiar with none of them and quick glances at the documentation do not give a simple