Microsoft SpeechSynthesizer crackles when outputting to files and streams

末鹿安然 提交于 2019-12-07 15:46:27

I find it hard to believe this is a PoSH issue. It's not PoSH doing the encoding on the serialization to disk. Its the API/Class that is being used.

'msdn.microsoft.com/en-us/library/system.speech.synthesis.speechsynthesizer(v=vs.110).aspx'

As per the MSDN, there is no option to control the encoding, bit rate, etc.

.wav has never been HQ stuff. So, I'd wonder if you take that .wav through a converter to make it an .mp3 or mp4, if that would correct your quality concerns. But that also means getting the converter on users systems.

Secondly, since Win8, the default player does not even play .wav correctly or at all. Sure, you can still set the default play of .wav to Windows Media Player or call the file via VLC, but it's still a .wav file. Yet, that also means, you having to set the Media Player assignment on every target system.

This is an issue with the SpeechSynthesizer API, which simply provides bad quality, crackling audio as seen in the samples above. The solution is to do what TextAloud does, which is to use the SpeechLib COM objects directly.

This is done by adding a COM reference to "Microsoft Speech Object Library (5.4)". Here is a snippet of the code I ended up with, which produces audio clips of the same quality as TextAloud:

public new static byte[] GetSound(Order o)
{
    const SpeechVoiceSpeakFlags speechFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
    var synth = new SpVoice();
    var wave = new SpMemoryStream();
    var voices = synth.GetVoices();
    try
    {
        // synth setup
        synth.Volume = Math.Max(1, Math.Min(100, o.Volume ?? 100));
        synth.Rate = Math.Max(-10, Math.Min(10, o.Rate ?? 0));
        foreach (SpObjectToken voice in voices)
        {
            if (voice.GetAttribute("Name") == o.Voice.Name)
            {
                synth.Voice = voice;
            }
        }
        wave.Format.Type = SpeechAudioFormatType.SAFT22kHz16BitMono;
        synth.AudioOutputStream = wave;
        synth.Speak(o.Text, speechFlags);
        synth.WaitUntilDone(Timeout.Infinite);

        var waveFormat = new WaveFormat(22050, 16, 1);
        using (var ms = new MemoryStream((byte[])wave.GetData()))
        using (var reader = new RawSourceWaveStream(ms, waveFormat))
        using (var outStream = new MemoryStream())
        using (var writer = new WaveFileWriter(outStream, waveFormat))
        {
            reader.CopyTo(writer);
            return o.Mp3 ? ConvertToMp3(outStream) : outStream.GetBuffer();
        }
    }
    finally
    {
        Marshal.ReleaseComObject(voices);
        Marshal.ReleaseComObject(wave);
        Marshal.ReleaseComObject(synth);
    }
}

This is the code to convert a wave file to mp3. It uses NAudio.Lame from nuget.

internal static byte[] ConvertToMp3(Stream wave)
{
    wave.Position = 0;
    using (var mp3 = new MemoryStream())
    using (var reader = new WaveFileReader(wave))
    using (var writer = new LameMP3FileWriter(mp3, reader.WaveFormat, 128))
    {
        reader.CopyTo(writer);
        mp3.Position = 0;
        return mp3.ToArray();
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!