speech | 易学教程

Can C# SAPI speak SSML string?

阅读更多关于 Can C# SAPI speak SSML string?

问题 I implemented a TTS in my C# WPF project. Previously, I use the TTS in System.Speech.Synthesis namespace to speak. The speaking content is in SSML format (Speech Synthesizer Markup Language, support customize the speaking rate, voice, emphasize) like following: <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><prosody rate="x-fast">hello world. This is a long sentence speaking very fast!</prosody></speak> But unfortunately the System.Speech.Synthesis TTS has a

Microsoft Speech Recognition: Alternate results with confidence score?

阅读更多关于 Microsoft Speech Recognition: Alternate results with confidence score?

问题 I'm new to working with the Microsoft.Speech recognizer (using Microsoft Speech Platform SDK Version 11) and I'm trying to have it output the n-best recognition matches from a simple grammar, along with the confidence score for each. According to the documentation (and as mentioned in the answer to this question), one should be able to use e.Result.Alternates to access the recognized words other than the top-scoring one. However, even after resetting the confidence rejection threshold to 0

Continuous Speech Recognition Android - Without Gaps

阅读更多关于 Continuous Speech Recognition Android - Without Gaps

问题 I have an activity that implements RecognitionListener . To make it continuous, every time onEndOfSpeech() I start the listener again: speech.startListening(recognizerIntent); But, it takes some time (around half a second) till it starts, so there is this half a second gap, where nothing is listening. Therefore, I miss words that were spoken in that time difference. On the other hand, when I use Google's Voice input, to dictate messages instead of the keyboard - this time gap does not exist.

Google Speech Recognition API

阅读更多关于 Google Speech Recognition API

问题 I'm trying to use the Google Speech API v2 (at address https://www.google.com/speech-api/v2/recognize?... ) I need to use my Api Key, but when I use it I get error 403 Forbidden When I use an API key that was on the example project I downloaded it is working fine. I saw that at the Google Developers Console I can enable a lot of api options, but didn't find any Speech-API option. Is there anything else I need to enable to get access to this API using my key? Thank you! 回答1: Instructions are

Google Speech Recognition API

阅读更多关于 Google Speech Recognition API

pyspeech (python) - Transcribe mp3 files?

阅读更多关于 pyspeech (python) - Transcribe mp3 files?

问题 I'd like to transcribe mp3 (speech-to-text) using the pyspeech API. I don't know if this is possible, though. Is it? How? 回答1: pyspeech seems to be merely a python interface to the regular Windows speech APIs. Most likely you'd create some method of treating mp3 playback as an audio source for that speech API to listen to. 回答2: I don't know about pyspeech, but if it is a Python wrapper around the Microsoft speech APIs, then some other posts may be helpful. Microsoft Speech engines do not

pyspeech (python) - Transcribe mp3 files?

阅读更多关于 pyspeech (python) - Transcribe mp3 files?

Control other applications using Java?

阅读更多关于 Control other applications using Java?

问题 How can I control other applications using Java ? I'm using the Mary Speech Synthesizer(Open source, Java). It can synthesize speech well , but it requires the text to be in a textbox in the application window itself and then a button to be clicked . For this project of mine the text that needs to be realized is gonna be inbound from another java application . I need to know how I can place the text in the textbox and send a click to one of the buttons in the application . I'm hoping to

How does a shell script read the data in a batch test folder

阅读更多关于 How does a shell script read the data in a batch test folder

问题 I recently replicated a SEGAN experiment based on TensorFlow0.12.1.The author provides a shell script for testing (clean_wav.sh), as shown in the figure below: This is the original version provided by the author. According to the path of my test data, the modified version is as follows: Noisy_testset_wav_16k is my test data folder, but running the script system will report an error: This folder is a directory, but when I change the path to: NOISY_WAVNAME='/home/zyf/SEGAN/ SEGAN/segan-master1

Speech recognition using Openears framework?

阅读更多关于 Speech recognition using Openears framework?

问题 Operears: The speech recognition(Speech to text) framework for iPhone(iOS Devices), I have installed openears demo app on my iPhone device, It works well but only for a list of words like GO, CHANGE, MODEL. Can we make speech recognition more generic for a real time speech recognition, that is, not limited to few words. It should be generic. Openears: http://www.politepix.com/openears/ 回答1: You have to use new Language Model instead of their default one. The language model is the vocabulary