Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

前端 未结 1 733
误落风尘
误落风尘 2020-12-02 00:49

I was playing around with this SAPI v5.1 library. So I was testing a sample WAV file I have. (Download it from here). Anyway, the sound in that file is clear and easy. It co

相关标签:
1条回答
  • 2020-12-02 01:37

    How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

    • 8 bits per sample
    • single channel mono
    • 22,050 samples per second
    • PCM encoding

    You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

    You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer - RecognizerInfo.SupportedAudioFormats Property.

    Update:

    Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

    On my Windows 7 machine, my default recognizer supports only the following audio formats:

      0:
      Encodingformat = Pcm
      BitsPerSample = 8
      BlockAlign = 1
      ChannelCount = 1
      SamplesPerSecond  = 16000
    
      1:
      Encodingformat = Pcm
      BitsPerSample = 16
      BlockAlign = 2
      ChannelCount = 1
      SamplesPerSecond  = 16000
    
      2:
      Encodingformat = Pcm
      BitsPerSample = 8
      BlockAlign = 1
      ChannelCount = 1
      SamplesPerSecond  = 22050
    
      3:
      Encodingformat = Pcm
      BitsPerSample = 16
      BlockAlign = 2
      ChannelCount = 1
      SamplesPerSecond  = 22050
    
      4:
      Encodingformat = ALaw
      BitsPerSample = 8
      BlockAlign = 1
      ChannelCount = 1
      SamplesPerSecond  = 22050
    
      5:
      Encodingformat = ULaw
      BitsPerSample = 8
      BlockAlign = 1
      ChannelCount = 1
      SamplesPerSecond  = 22050
    

    You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: "three" and "3". This probably isn't what you want. You could use a semantic result value in your grammar to return the number 3 for the word "three".

    0 讨论(0)
提交回复
热议问题