phoneme

从零开始玩人工智能—语音API-03

核能气质少年 提交于 2020-10-08 02:25:43
还在担心自己的英语发音不标准?请个外教教发音太贵?有语音认知服务还要啥自行车啊~ 既然放音和录音我们都尝试过了,那么来一个更有难度的实验吧。 发音评估 实际上,语音转文本的服务中,提供了一个发音评估参数。利用这个参数,就能够对发送的语音进行发音评估。很有趣吧?我们看看 Speech-to-Text REST API 是怎么说明的。 要实现发音评估功能,只需简单在提交语音转文本请求的时候,在头部header中添加 'Pronunciation-Assessment' 这个字段即可。该字段指定用于在识别结果中显示发音评分的参数,这些参数可评估语音输入的发音质量,并显示准确性、熟练、完整性等。此参数是 base64 编码的 json,其中包含多个详细参数。 和前面的内容一样,我们首先做些准备工作,首先把代码环境设置好。 import requests import pyaudio, wave import os, json, base64 from xml.etree import ElementTree # constents for WAV file CHUNK = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 16000 RECORD_SECONDS = 5 # speech service information

Transfer Learning

心已入冬 提交于 2020-05-05 21:52:43
本文中的source data 和target data分别是有大量的非当前任务数据集和少量的当前任务数据集。  2 examples:  Both Labeled 1. Model Fine Tuning  当target data的量很少的时候,称为one-shot learning。 Fine Tuning: 将在source data中训练好的model作为target data的初始model继续训练。 Overtraining: 当target data的数量实在太少,在source 的 model一直训练就会出现overfitting.两种方法解决: Conservative Training:  做一些constraint:限制两个网络对于同一个输入的输出尽量接近,或者两个网络的参数尽量接近。 Layer Transfer:  只训练source data网络其中的几层,其他层固定(直接copy)。那到底哪些layer需要训练,哪些可以固定? 不同的任务有对应不同的layer选择: 对于语音,因为每个人发音的生理结构不太一样,但每个词是一样的,因此需要训练前面几层,而后面几层直接copy。 对于图像,一些基本的轮廓是可以共享的,但最后图像的分类是不一样的,所以训练后面几层,而前面几层直接copy。  2. Multitask Learning 

english-phoneme

天大地大妈咪最大 提交于 2020-05-05 01:12:19
1. 声音概述 2. 音素phoneme与音标 2.1 音素与音标 2.2 音素与字母 2.3 字母发音-字母自然发音对照表 2.4 音标表 2.5 元音字母-辅音字母表 2.6 单元音发音口形趋势表 3. 音节的概念 3.1 音节的分类 3.2 音节的划分 3.3 音节的拼读方法 3.4 重读音节 4. 英语的重读 5. 小结:发音及拼写有规则也有例外 6. 英语单词重音技巧学习资源 7. 更多推荐阅读 1. 声音概述 物理学的声音有4个角度 音色(音质): 是一种声音区别于另外一种声音的基本特征,比如人的声音和鸟的声音。所以,对于人声识别的研究来说,音色肯定是主要研究对象。 音调(频率): 指声音的高低,取决于声波的频率,可以笼统的认为就是基音频率。比如男声和女生,一般男声低沉女声尖锐。 音强(幅度): 表征声音的强弱,由声波的震动幅度决定,在语音信号处理中,可以直观理解为信号幅度(但是音强计算式却不是简单的幅度)。 音长(时长): 发音时间的长短,这个很好理解。对于初学者,这几种声音的特性,基本上可以与语音信号特征对应起来。 2. 音素phoneme与音标 2.1 音素与音标 音素是从(音质)的角度划分的最小语音单位,音标就是音素的书写符号。 从发音特征上分为两类 元音(母音): 发音时,气流不受阻碍。元音是音节的核心。 辅音(子音): 发音时,气流会或多或少的受到阻碍

How to get speech recognition to detect SAPI emphasis markers?

天大地大妈咪最大 提交于 2019-12-11 14:59:52
问题 It is possible to extract the default phonemes for a given word via SAPI by: Voice word with text-to-speech and store output in a .wav Use the .wav as input for speech recognition Upon recognition of the word extract the phonemes from the recognized phrase elements However I have not been able to capture (if available) emphasis markers ("1" and "2" per the American English Phoneme Table). Is there a way to do this? EDIT: Here is what I've attempted so far (not pretty, but functional). Sadly

CMU Sphinx4 phoneme dictation

无人久伴 提交于 2019-12-10 20:51:44
问题 How can I configure sphinx4 to be able to detect only phonemes in a dictation? I've already read about partial results "You can control how often the result listener is fired by setting the configuration variable 'featureBlockSize' in the decoder." But my problem is that there are always a grammar needed, like hello.gram in the helloworld example. I need to be able to detect and recognize phoneme from a continuous speech. 回答1: This is what Sphinx has to say about it: Phoneme Recognition

iOS / C: Algorithm to detect phonemes

随声附和 提交于 2019-12-03 07:56:42
问题 I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs. Preferably the lowest level that does the job. I'm developing radical / experimental musical training software for iPhone / iPad. My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual

SAPI Symbol Usage for Speech Dictionary Input

风格不统一 提交于 2019-12-02 18:58:14
问题 I've been doing some work to add words and pronunciations to the Windows speech dictionary via the SpLexicon Interface of SAPI 5.4 (which I think is the only way to do it) via the AddPronunciation function, or in my case: // Initialize SpLexicon instance SpLexicon lex = new SpLexicon(); // Specify the word to add to the speech dictionary string myWord = "father"; // Set the language ID (US English) int langid = new System.Globalization.CultureInfo("en-US").LCID; // Specify the word's part of

Speech to Phoneme in .Net

吃可爱长大的小学妹 提交于 2019-12-01 19:09:31
The problem is that I want to get phonemes of a audio speech in C# language. say you have an audio file like "x.wav" that says "hello dear Shamim". i want to extract all the phonemes of the speech and their relative timings. something like the picture below: I used System.Speech library (both recognition and synthesis namespaces) but i didn't find what i wanted. Now don't be mistaken! I don't want the phonemes of the sentence "hello dear Shamim", i want to extract the phonemes from an unknown audio input that speaks and English sentence. I tried System.Speech.Recognition but it tries to

Speech to Phoneme in .Net

橙三吉。 提交于 2019-12-01 17:52:59
问题 The problem is that I want to get phonemes of a audio speech in C# language. say you have an audio file like "x.wav" that says "hello dear Shamim". i want to extract all the phonemes of the speech and their relative timings. something like the picture below: I used System.Speech library (both recognition and synthesis namespaces) but i didn't find what i wanted. Now don't be mistaken! I don't want the phonemes of the sentence "hello dear Shamim", i want to extract the phonemes from an unknown

Detect similar sounding words in Ruby

耗尽温柔 提交于 2019-11-30 17:59:39
问题 I'm aware of SOUNDEX and (double) Metaphone, but these don't let me test for the similarity of words as a whole - for example "Hi" sounds very similar to "Bye", but both of these methods will mark them as completely different. Are there any libraries in Ruby, or any methods you know of, that are capable of determining the similarity between two words? (Either a boolean is/isn't similar, or numerical 40% similar) edit: Extra bonus points if there is an easy method to 'drop in' a different