speech-recognition

How to convert a mel spectrogram to log-scaled mel spectrogram

耗尽温柔 提交于 2021-02-08 10:35:25
问题 I was reading this paper on environmental noise discrimination using Convolution Neural Networks and wanted to reproduce their results. They convert WAV files into log-scaled mel spectrograms. How do you do this? I am able to convert a WAV file to a mel spectrogram y, sr = librosa.load('audio/100263-2-0-117.wav',duration=3) ps = librosa.feature.melspectrogram(y=y, sr=sr) librosa.display.specshow(ps, y_axis='mel', x_axis='time') I am also able to display it as a log scaled spectrogram: librosa

How to convert a mel spectrogram to log-scaled mel spectrogram

跟風遠走 提交于 2021-02-08 10:31:01
问题 I was reading this paper on environmental noise discrimination using Convolution Neural Networks and wanted to reproduce their results. They convert WAV files into log-scaled mel spectrograms. How do you do this? I am able to convert a WAV file to a mel spectrogram y, sr = librosa.load('audio/100263-2-0-117.wav',duration=3) ps = librosa.feature.melspectrogram(y=y, sr=sr) librosa.display.specshow(ps, y_axis='mel', x_axis='time') I am also able to display it as a log scaled spectrogram: librosa

Change default language for Speech recognition in my app

北城以北 提交于 2021-02-08 07:22:32
问题 I make an app in English. My app uses Speech recognition. But if I install this app on device with another system language, French or Russian for example. My speech recognition doesn't work. It works only for language which by default in system. How can I make English language for Speech recognition by default for my app? I found this method but it doesn't work Locale myLocale; myLocale = new Locale("English (US)", "en_US"); Locale.setDefault(myLocale); android.content.res.Configuration

Speech-to-text large audio files [Microsoft Speech API]

时光怂恿深爱的人放手 提交于 2021-02-07 18:42:46
问题 What is the best way to transcribe medium/large audio files, ~ 6-10 mins each file, using Microsoft Speech API? Something like batch audio files transcription? I have used the code provided in https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample, for continuously transcribing speech, but it stops transcribing at some point. Is there any restriction on the transcription? I am only using the free trial account atm. Btw, I assume there is no difference

How to create text-to-speech with neural network

走远了吗. 提交于 2021-02-07 10:59:33
问题 I am creating a Text to Speech system for a phonetic language called "Kannada" and I plan to train it with a Neural Network. The input is a word/phrase while the output is the corresponding audio. While implementing the Network, I was thinking the input should be the segmented characters of the word/phrase as the output pronunciation only depends on the characters that make up the word, unlike English where we have slient words and Part of Speech to consider. However, I do not know how I

pocketsphinx - how to switch from keyword spotting to grammar mode

回眸只為那壹抹淺笑 提交于 2021-02-07 09:01:02
问题 I'm using pocketsphinx with raspberry pi for home automation. I've written a simple JSGF grammar file with the supported commands. Now, I want to use an activation phrase such as "hey computer" prior to the commands, to avoid false detections and only perform speech recognition once the activation phrase has been spoken. If I'm not getting this wrong, pocketsphinx supports two modes for speech recognition: keyword spotting mode, and language model / JSGF grammar mode. In pocketsphinx FAQ when

How can I use audio file as audio source in SpeechRecognition in Python?

有些话、适合烂在心里 提交于 2021-02-04 16:36:12
问题 I used speech_recognition.AudioFile in Python 3.6 , but this error was indicated: AttributeError: module 'speech_recognition' has no attribute 'AudioFile' This is my code: #!/usr/bin/env python3 import speech_recognition as sr # obtain path to "english.wav" in the same folder as this script from os import path AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") # AUDIO_FILE = path.join(path

HTK installing in windows10 / Not able to find VC98

こ雲淡風輕ζ 提交于 2021-01-29 15:49:51
问题 I am trying to install HTK Toolkit in my windows 10 machine. It has a prerequisite of: Ensure that your PATH contains C:\Program Files\Microsoft Visual Studio\VC98\bin I installed Microsoft Visual studios but I am not able to find the VC98 folder(file) in location where my visual studio is installed. I tried to search it a lot of times, but still I was unsuccessful in finding it. Can someone please solve this for me. My final goal is to install HTK Installing HTK on Microsoft Windows

Can I control the start & finish time when I use speech-recognition in python?

妖精的绣舞 提交于 2021-01-29 05:18:34
问题 I did coding as below. however I want to know whether there is some ways to control the recording duration. I actually, want to have a program which has a start & finish buttons so that I can control to record. I know that it is like an elementary question. but I really need to solve it. help me~ how should I compensate this problem? import speech_recognition as sr r = sr.Recognizer() mic = sr.Microphone() show = input("enter text: ") print("Read text\a") with mic as source: audio = r.listen

Python - TypeError: listen() missing 1 required positional argument: 'self'

ⅰ亾dé卋堺 提交于 2021-01-28 11:42:10
问题 I have been working on an AI in PyCharm but and I have seem to have encountered an error with speech_recognition trying to call a method to try to get audio input: /Users/waynedeng/Desktop/AI/venv/bin/python /Users/waynedeng/Desktop/AI/dawg_2.0.py Listening... Traceback (most recent call last): File "/Users/waynedeng/Desktop/AI/dawg_2.0.py", line 37, in <module> input = read_input() File "/Users/waynedeng/Desktop/AI/dawg_2.0.py", line 20, in read_input audio = speech.listen(source=source,