C# Speech Recognition - Is this what the user said?

后端 未结 11 954
感动是毒
感动是毒 2020-11-28 19:49

I have need to write an application which uses a speech recognition engine -- either the built in vista one, or a third party one -- that can display a word or phrase, and r

相关标签:
11条回答
  • 2020-11-28 20:09

    Text to speech is available with the Speech API. Personally, I'd probably require Vista and use the managed interfaces to System.Speech.SpeechRecognition and System.Speech.Synthesis.TtsEngine, but a P/Invoke should be possible into the unmanaged APIs if you really need XP support.

    0 讨论(0)
  • 2020-11-28 20:11

    A similar question was asked on Joel on Software a while back. You can use the System.Speech.Recognition namespace to do this...with some limitations. Add System.Speech (should be in the GAC) to your project. Here's some sample code for a WinForms app:

    public partial class Form1 : Form
    {
      SpeechRecognizer rec = new SpeechRecognizer();
    
      public Form1()
      {
        InitializeComponent();
        rec.SpeechRecognized += rec_SpeechRecognized;
      }
    
      void rec_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
      {
        lblLetter.Text = e.Result.Text;
      }
    
      void Form1_Load(object sender, EventArgs e)
      {
        var c = new Choices();
        for (var i = 0; i <= 100; i++)
          c.Add(i.ToString());
        var gb = new GrammarBuilder(c);
        var g = new Grammar(gb);
        rec.LoadGrammar(g);
        rec.Enabled = true;
      }
    

    This recognizes the numbers from 1 to 100, and displays the resulting number on the form. You'll need a form with a label called lblLetter on it.

    System.Speech only works with a pre-defined list of words or phrases; it's not exactly NaturallySpeaking, either in versatility or in recognition quality. But you don't have to train it to the user's voice, and if you only have a few different things the user can say, it works reasonably well. And it's free! (if you have Visual Studio)

    It won't work well if you use very short phrases; I made a program for my kid to say letters of the alphabet and see them on-screen, but it doesn't do that well since many of the letters sound alike (especially from the mouth of a four-year-old).

    As for more flexible options...well, there's the aforementioned NaturallySpeaking, which has an SDK. But you have to contact sales to get any sort of access to it, and no pricing is listed, so it comes across as one of those "How much does it cost? Well, how much have you got?" kind of things. There doesn't seem to be a "download and play around with it" option. :(

    As for text-to-speech, System.Speech.Synthesis does this. It's even easier than the speech recognition. I wrote a small program to let me type, hit Enter, and read the text aloud. My four-year-old gets mesmerized by it. :) ("Daddy, I wanna tawk to da wobot.")

    0 讨论(0)
  • 2020-11-28 20:16

    Dragon Naturally Speaking SDK might be worth looking at. This project looked interesting.

    Haven't got to play with either of them though.

    0 讨论(0)
  • 2020-11-28 20:19

    Well, this question already has many good responses but I think it is valuable to update with some info from 2016 documentation the responses from Rob Segal and Philipp Schmid pointing to this nice code example:

    https://msdn.microsoft.com/en-us/library/office/system.speech.recognition.speechrecognitionengine.aspx

    It did not use the shared recognizer of Windows (The little Windows Mic that shows out up in the middle of the screen), it use a nice in app SpeechRecognitionEngine that not need any visual cue. The UI is completly at your control.

    0 讨论(0)
  • Be warned that you're not going to get good results if you don't require training first. Speech recognition is a statistical application of phonetics, a field which is pretty frank about the fact that there's so much variation in the signal that it's almost a miracle anyone can understand what anyone else says. An off-the-shelf speech recognition engine will most likely tend towards a more general accent of English, but will fail miserably for anything even slightly different.

    That's why training is so important. We can do well by overfitting with ease, especially if we reduce the problem space. But creating an extensible machine learning solution? Therein always lies the rub.

    That being says, consider Sphinx-4. It's an off-the-shelf solution written in Java available at http://cmusphinx.sourceforge.net/sphinx4/

    0 讨论(0)
提交回复
热议问题