speech-recognition | 易学教程

Applying neural network to MFCCs for variable-length speech segments

阅读更多关于 Applying neural network to MFCCs for variable-length speech segments

I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs. At the moment, I'm using 26 coefficients for each sample, and a total of 5 different classes - these are five different words with varying numbers of syllables. While each sample is 2 seconds long, I am unsure how to handle cases where the user can pronounce words either very slowly or very quickly. E.g., the word 'television' spoken within 1 second yields different coefficients than the word spoken within two seconds. Any advice on how I can solve this problem would be much

French speech recognition on iOS

阅读更多关于 French speech recognition on iOS

I'm trying to develop an iOS app using speech-recognition for french language, but have been unsuccessful until now. I tried using the openEars framework, which worked great for english language, but doesn't support french. I used some info from this link . If anyone know a solution it would be awesome. Thanks Openers is using English Acoustic and Language Models by default. So it works well with english, but doesn't support french. You can download French Acoustic and Language Models from CMU Sphinx website Some good French Acoustic and Language Models are available here Download & Change

Running Android Speech Recognition as Service: will not start

阅读更多关于 Running Android Speech Recognition as Service: will not start

问题 I'm using the solution here: Android Speech Recognition as a service on Android 4.1 & 4.2 The code below gets to the onStartCommand() method, however the speech recogntion never seems to kick off, as evidenced by the fact that the onReadyForSpeech() method is never called. UPDATE: So I added and that allowed the onReadyForSpeech() to be called , BUT onError() is called with error code: 6 after the onReadyForSpeech() method is complete (this goes into a continuous loop because the start

Add new word to windows speech recognition using C#

阅读更多关于 Add new word to windows speech recognition using C#

问题 i know how to use speech recognition in C# but the problem is how to add a special word or name into windows speech dictionary database? in windows 7 and 8 you can do it easily using: Opening Speech Dictionary > Add new word > Enter the Text of word > Record the pronunciation of the word by Microphone and then,it's OK! the word will add to database. we also can edit the word using the Speech Dictionary. does anyone know how can we do these steps with .NET and programming? EDIT: its very

Spectrograms generated using Librosa don't look consistent with Kaldi?

阅读更多关于 Spectrograms generated using Librosa don't look consistent with Kaldi?

问题 I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above. time_series, sample_rate = librosa.core.load("7a.wav",sr=20000) spectrogram = librosa.feature

Raspberry Pi Asynchronous/Continuous Speech Recognition in Python

阅读更多关于 Raspberry Pi Asynchronous/Continuous Speech Recognition in Python

问题 I want to create a speech recognition script for the Raspberry Pi in Python and need an asynchronous/continuous speech recognition library. Asynchronous means that I need endless running of the recognition until the spoken matches to an array of words without any input from a keyboard, and then display the spoken to the terminal and restart recognition. I already had a look at PocketSphinx, but after a few hours Googling, I didn't find anything about an Asynchronous recognition with that. Do

Text to speech not working on android device

阅读更多关于 Text to speech not working on android device

问题 Below is my code.I am unable to hear the voice in my kitkat device.Toast is appearing but voice is not playing.I am following this tutorial https://www.tutorialspoint.com/android/android_text_to_speech.htm package com.example.insert; import android.os.Parcelable; import android.support.v7.app.AppCompatActivity; import android.os.Bundle; import android.app.Activity; import android.os.Bundle; import android.speech.tts.TextToSpeech; import android.view.View; import android.widget.Button; import

SpeechRecognizer insufficient permissions error with Glass

阅读更多关于 SpeechRecognizer insufficient permissions error with Glass

问题 I am building an application with the GDK sneak peek and am having trouble getting speech recognition working in an immersive app. This is my first android project. I tried to follow this: How can I use speech recognition without the annoying dialog in android phones After making initial progress, I hit a problem where the RecognitionListener class is throwing Error 9, insufficient permissions. I am using the GDK, which is Android-15. Initialization of the Recognizer is in my onCreate()

How to input and process audio files to convert to text via pyspeech or dragonfly

阅读更多关于 How to input and process audio files to convert to text via pyspeech or dragonfly

问题 I have seen the documentation of pyspeech and dragonfly, but don't know how to input an audio file to be converted into text. I have tried it with microphone via speaking to it and the speech is converted into text, but If I want to input a previously recorded audio file. Can anyone help with an example? 回答1: Both PySpeech and Dragonfly are relatively thin wrappers over SAPI. Unfortunately, both of them use the shared recognizer, which doesn't support input selection. While I'm familiar with

Free-form text with custom SRGS based Grammar

阅读更多关于 Free-form text with custom SRGS based Grammar

问题 I am trying to develop a Voice based application that would accept user input as speech and perform some actions based on the input. This is my first ever venture into this technology and I am learning while developing it. I am using Microsoft SAPI shipped with dotnet 4 to recognize speech. So far, I have learned about the two types of modes it supports. Speech recognition (SR) has two modes of operation: Dictation mode — an unconstrained, free-form speech interpretation mode that uses a