google-speech-api

Splitting an Ogg Opus File stream

最后都变了- 提交于 2021-02-10 05:38:30
问题 I am trying to send an OGG_OPUS encoded stream to google's speech to text streaming service. Since there is a time limit imposed by Google for their stream requests, I have to route the audio stream to another Google Speech To Text streaming session on a fixed interval. From what I've read, the pages in the OGG stream cannot be read independently since the data in the pages are calculated by considering the data of the previous and next pages. If that is the case, can we cut off the stream at

How to Google Speech-to-Text using Blob sent from Browser to Nodejs Server

我只是一个虾纸丫 提交于 2021-02-07 06:24:27
问题 I am trying to set up a server to receive audio from a client browser using SocketIO , then process it through Google Speech-to-Text, and finally reply back to the client with the text. Originally and ideally, I wanted to set up to function somewhat like the tool on this page: https://cloud.google.com/speech-to-text/ I tried using getUserMedia and streaming it through SocketIO-Stream , but I couldn't figure out how to 'pipe' MediaStream . Instead, now I've decided to use MediaRecorder on the

How to Google Speech-to-Text using Blob sent from Browser to Nodejs Server

本秂侑毒 提交于 2021-02-07 06:23:57
问题 I am trying to set up a server to receive audio from a client browser using SocketIO , then process it through Google Speech-to-Text, and finally reply back to the client with the text. Originally and ideally, I wanted to set up to function somewhat like the tool on this page: https://cloud.google.com/speech-to-text/ I tried using getUserMedia and streaming it through SocketIO-Stream , but I couldn't figure out how to 'pipe' MediaStream . Instead, now I've decided to use MediaRecorder on the

Whats the best way to use google credentials for production app?

依然范特西╮ 提交于 2021-01-27 17:36:56
问题 I'm building a C# .net application for STT and I'm creating credentials manually. I find the documentation hugely confusing for me and I dont know how to add the credentials properly. I added a project, created a json credential and downloaded and kept on a folder and pointing to it for manually with GoogleCredential for authorization and everythings working good. But this cant be a solution for a shipped app. Current approach: GoogleCredential credentials = GoogleCredential.FromFile(Path

google cloud speech ImportError: cannot import name 'enums'

房东的猫 提交于 2020-06-01 06:57:08
问题 I'm using google-cloud-speech api for my project . I'm using pipenv for virtual environment i installed google-cloud-speech api with pipenv install google-cloud-speech and pipenv update google-cloud-speech i followed this docs https://cloud.google.com/speech-to-text/docs/reference/libraries This is my code: google.py: # !/usr/bin/env python # coding: utf-8 import argparse import io import sys import codecs import datetime import locale import os from google.cloud import speech_v1 as speech

'Audio data must be audio data' error with google speech recognition in python

大城市里の小女人 提交于 2020-05-29 10:14:02
问题 I am trying to load an audio file in python and process it with google speech recognition The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data I dont understand how it's possible to convert from one data type to another in python The code in question is below, import speech_recognition as spr import librosa audio, sr = librosa.load('sample_data/metal.mp3

'Audio data must be audio data' error with google speech recognition in python

混江龙づ霸主 提交于 2020-05-29 10:10:07
问题 I am trying to load an audio file in python and process it with google speech recognition The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data I dont understand how it's possible to convert from one data type to another in python The code in question is below, import speech_recognition as spr import librosa audio, sr = librosa.load('sample_data/metal.mp3

Can the Google Speech API be configured to return only numbers / letters?

荒凉一梦 提交于 2020-05-13 07:14:47
问题 Can the Google Speech API be configured to only return numbers and letters, as opposed to full words? The use case is translating Canadian postal codes. Ex. M 1 B 0 R 3. Google may return "Em 1 Be 0 Are 3" We have tried: Using speechContexts and feeding in letters A - Z, as individual phrases. This improved the accuracy for us. We did not have much success passing in individual numbers (ex 1, 2, 3). Specifying the codec and sample rate of our WAV file using the encoding and sampleRateHertz