Python having trouble accessing usb microphone using Gstreamer to perform speech recognition with Pocketsphinx on a Raspberry Pi

不打扰是莪最后的温柔 提交于 2019-12-08 17:11:32

问题


So python is acting like acting like it can't hear ANYTHING from my microphone at all.

Here's the problem. I have a Python ( 2.7 ) script that is suppose to be using Gstreamer to access my microphone and do speech recognition for me via Pocketsphinx. I'm using Pulse Audio and my device is a Raspberry Pi. My microphone is a Playstation 3 Eye.

Now off the bat, I have already gotten pocketsphinx_continuous to run correctly and recognize the words I have defined in my .dict and .lm files. The accuracy is around 85-90% accurate after a couple trial runs I've had. So off the bat I know my microphone is picking up sound normally via pocketsphinx + pulse audio.

FYI I ran the following:

pocketsphinx_continuous -lm /home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm -dict /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic -hmm /home/pi/dev/scarlettPi/config/speech/model/hmm/en_US/hub4wsj_sc_8k -silprob  0.1 -wip 1e-4 -bestpath 0

In my python code i'm attempting to do the same thing, but i'm using gstreamer to access the microphone in python. ( Note: I'm a bit new to Python )

Here is my code ( Thanks Josip Lisec for getting me this far ):

import pi
from pi.becore import ScarlettConfig
from recorder import Recorder
from brain import Brain

import os
import json
import tempfile
#import sys

import pygtk
pygtk.require('2.0')
import gtk
import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst

scarlett_config=ScarlettConfig()

class Listener:
  def __init__(self, gobject, gst):
    self.failed = 0

    self.pipeline = gst.parse_launch(' ! '.join(['pulsesrc',
                                               'audioconvert',
                                               'audioresample',
                                               'vader name=vader auto-threshold=true',
                                               'pocketsphinx lm=' + scarlett_config.get('LM') + ' dict=' + scarlett_config.get('DICT') + ' hmm=' + scarlett_config.get('HMM') + ' name=listener',
                                               'fakesink']))
    listener = self.pipeline.get_by_name('listener')
    listener.connect('result', self.__result__)
    listener.set_property('configured', True)
    print "KEYWORDS WE'RE LOOKING FOR: " + scarlett_config.get('ourkeywords')

    bus = self.pipeline.get_bus()
    bus.add_signal_watch()
    bus.connect('message::application', self.__application_message__)
    self.pipeline.set_state(gst.STATE_PLAYING)

  def result(self, hyp, uttid):
    if hyp in scarlett_config.get('ourkeywords'):
      self.failed = 0
      self.listen()
    else:
      self.failed += 1
      if self.failed > 4:
        pi.speak("" + scarlett_config.get('scarlett_owner') + ", if you need me, just say my name.")
        self.failed = 0

  def listen(self):
    self.pipeline.set_state(gst.STATE_PAUSED)
    pi.play('pi-listening')
    Recorder(self)

  def cancel_listening(self):
    pi.play('pi-cancel')
    self.pipeline.set_state(gst.STATE_PLAYING)

  # question - sound recording
  def answer(self, question):
    pi.play('pi-cancel')

    print " * Contacting Google"
    destf = tempfile.mktemp(suffix='piresult')
    os.system('wget --post-file %s --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7" --header="Content-Type: audio/x-flac; rate=16000" -O %s -q "https://www.google.com/speech-api/v1/recognize?client=chromium&lang=en-US"' % (question, destf))
    #os.system("speech2text %s > %s" % (question, destf))
    b = open(destf)
    result = b.read()
    b.close()

    os.unlink(question)
    os.unlink(destf)

    if len(result) == 0:
      print " * nop"
      pi.play('pi-cancel')
    else:
      brain = Brain(json.loads(result))
      if brain.think() == False:
        print " * nop2"
        pi.play('pi-cancel')

    self.pipeline.set_state(gst.STATE_PLAYING)

  def __result__(self, listener, text, uttid):
    struct = gst.Structure('result')
    struct.set_value('hyp', text)
    struct.set_value('uttid', uttid)
    listener.post_message(gst.message_new_application(listener, struct))

  def __application_message__(self, bus, msg):
    msgtype =  msg.structure.get_name()
    if msgtype == 'result':
      self.result(msg.structure['hyp'], msg.structure['uttid'])

The application is suppose to match on the keyword "Scarlett" then perform an action after that.

When I run my application, I get the following output:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ ./pi 
/usr/lib/python2.7/dist-packages/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
INFO: cmd_ln.c(691): Parsing command line:
gst-pocketsphinx \
    -samprate 8000 \
    -cmn prior \
    -fwdflat no \
    -bestpath no \
    -maxhmmpf 2000 \
    -maxwpf 20 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ascale     20.0        2.000000e+01
-aw     1       1
-backtrace  no      no
-beam       1e-48       1.000000e-48
-bestpath   no      no
-bestpathlw 9.5     9.500000e+00
-bghist     no      no
-ceplen     13      13
-cmn        current     prior
-cmninit    8.0     8.0
-compallsen no      no
-debug              0
-dict               
-dictcase   no      no
-dither     no      no
-doublebw   no      no
-ds     1       1
-fdict              
-feat       1s_c_d_dd   1s_c_d_dd
-featparams         
-fillprob   1e-8        1.000000e-08
-frate      100     100
-fsg                
-fsgusealtpron  yes     yes
-fsgusefiller   yes     yes
-fwdflat    yes     no
-fwdflatbeam    1e-64       1.000000e-64
-fwdflatefwid   4       4
-fwdflatlw  8.5     8.500000e+00
-fwdflatsfwin   25      25
-fwdflatwbeam   7e-29       7.000000e-29
-fwdtree    yes     yes
-hmm                
-input_endian   little      little
-jsgf               
-kdmaxbbi   -1      -1
-kdmaxdepth 0       0
-kdtree             
-latsize    5000        5000
-lda                
-ldadim     0       0
-lextreedump    0       0
-lifter     0       0
-lm             
-lmctl              
-lmname     default     default
-logbase    1.0001      1.000100e+00
-logfn              
-logspec    no      no
-lowerf     133.33334   1.333333e+02
-lpbeam     1e-40       1.000000e-40
-lponlybeam 7e-29       7.000000e-29
-lw     6.5     6.500000e+00
-maxhmmpf   -1      2000
-maxnewoov  20      20
-maxwpf     -1      20
-mdef               
-mean               
-mfclogdir          
-min_endfr  0       0
-mixw               
-mixwfloor  0.0000001   1.000000e-07
-mllr               
-mmap       yes     yes
-ncep       13      13
-nfft       512     512
-nfilt      40      40
-nwpen      1.0     1.000000e+00
-pbeam      1e-48       1.000000e-48
-pip        1.0     1.000000e+00
-pl_beam    1e-10       1.000000e-10
-pl_pbeam   1e-5        1.000000e-05
-pl_window  0       0
-rawlogdir          
-remove_dc  no      no
-round_filters  yes     yes
-samprate   16000       8.000000e+03
-seed       -1      -1
-sendump            
-senlogdir          
-senmgau            
-silprob    0.1     1.000000e-01
-smoothspec no      no
-svspec             
-tmat               
-tmatfloor  0.0001      1.000000e-04
-topn       4       4
-topn_beam  0       0
-toprule            
-transform  legacy      legacy
-unit_area  yes     yes
-upperf     6855.4976   6.855498e+03
-usewdphones    no      no
-uw     1.0     1.000000e+00
-var                
-varfloor   0.0001      1.000000e-04
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wbeam      7e-29       7.000000e-29
-wip        1e-4        1.000000e-04
-wlen       0.025625    2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
    -nfilt 20 \
    -lowerf 1 \
    -upperf 4000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -svspec 0-12/13-25/26-38 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -cmninit 56,-3,1 \
    -varnorm no 

Current configuration:
[NAME]      [DEFLT]     [VALUE]
-agc        none        none
-agcthresh  2.0     2.000000e+00
-alpha      0.97        9.700000e-01
-ceplen     13      13
-cmn        current     current
-cmninit    8.0     56,-3,1
-dither     no      no
-doublebw   no      no
-feat       1s_c_d_dd   1s_c_d_dd
-frate      100     100
-input_endian   little      little
-lda                
-ldadim     0       0
-lifter     0       0
-logspec    no      no
-lowerf     133.33334   1.000000e+00
-ncep       13      13
-nfft       512     512
-nfilt      40      20
-remove_dc  no      yes
-round_filters  yes     no
-samprate   16000       8.000000e+03
-seed       -1      -1
-smoothspec no      no
-svspec             0-12/13-25/26-38
-transform  legacy      dct
-unit_area  yes     yes
-upperf     6855.4976   4.000000e+03
-varnorm    no      no
-verbose    no      no
-warp_params            
-warp_type  inverse_linear  inverse_linear
-wlen       0.025625    2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 4120 * 20 bytes (80 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 13 words read
INFO: dict.c(341): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=18, 3=17
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516):       12 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
INFO: ngram_model_arpa.c(533):       18 = #bigrams created
INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
INFO: ngram_model_arpa.c(555):       17 = #trigrams created
INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
INFO: ngram_search_fwdtree.c(99): 12 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 12 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 152
INFO: ngram_search_fwdtree.c(338): after: 12 root, 24 non-root channels, 11 single-phone words
KEYWORDS WE'RE LOOKING FOR: [ 'scarlett', 'SCARLETT' ]    

But it fails to match on anything. I almost think python can not hear anything from the microphone, there aren't even any attempts to recognize anything. In pocketsphinx_continuious it usually prints out a READY state when its prepared to start listening...I expect the same in python?

Here are my python packages:

pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $ dpkg -l | grep -i python
ii  idle                                  2.7.3-4                              all          IDE for Python using Tkinter (default version)
ii  idle-python2.7                        2.7.3-6                              all          IDE for Python (v2.7) using Tkinter
rc  idle3                                 3.2.3-6                              all          IDE for Python using Tkinter (default version)
ii  libpyside1.1:armhf                    1.1.1-3                              armhf        Python bindings for Qt 4 (base files)
ii  libpython2.6                          2.6.8-1.1                            armhf        Shared Python runtime library (version 2.6)
ii  libpython2.7                          2.7.3-6                              armhf        Shared Python runtime library (version 2.7)
ii  libshiboken1.1:armhf                  1.1.1-1                              armhf        CPython bindings generator for C++ libraries - shared library
ii  python                                2.7.3-4                              all          interactive high-level object-oriented language (default version)
ii  python-alsaaudio                      0.5+svn36-1                          armhf        Alsa bindings for Python
ii  python-cairo                          1.8.8-1                              armhf        Python bindings for the Cairo vector graphics library
ii  python-dbg                            2.7.3-4                              all          debug build of the Python Interpreter (version 2.7)
ii  python-dbus                           1.1.1-1                              armhf        simple interprocess messaging system (Python interface)
ii  python-dbus-dev                       1.1.1-1                              all          main loop integration development files for python-dbus
ii  python-dev                            2.7.3-4                              all          header files and a static library for Python (default)
ii  python-gi                             3.2.2-2                              armhf        Python 2.x bindings for gobject-introspection libraries
ii  python-gi-dbg                         3.2.2-2                              armhf        Python bindings for the GObject library (debug extension)
ii  python-gi-dev                         3.2.2-2                              all          development headers for GObject Python bindings
ii  python-gobject                        3.2.2-2                              all          Python 2.x bindings for GObject - transitional package
ii  python-gobject-2                      2.28.6-10                            armhf        deprecated static Python bindings for the GObject library
ii  python-gobject-2-dbg                  2.28.6-10                            armhf        deprecated static Python bindings for the GObject library (debug extension)
ii  python-gobject-2-dev                  2.28.6-10                            all          development headers for the static GObject Python bindings
ii  python-gobject-dbg                    3.2.2-2                              all          Python 2.x debugging modules for GObject - transitional package
ii  python-gobject-dev                    3.2.2-2                              all          Python 2.x development headers for GObject - transitional package
ii  python-gst0.10                        0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-dbg                    0.10.22-3                            armhf        generic media-playing framework (Python debug bindings)
ii  python-gst0.10-dev                    0.10.22-3                            armhf        generic media-playing framework (Python bindings)
ii  python-gst0.10-rtsp                   0.10.8-3                             armhf        GStreamer RTSP server plugin (Python bindings)
ii  python-gtk2                           2.24.0-3                             armhf        Python bindings for the GTK+ widget set
ii  python-iplib                          1.1-3                                all          Python library to convert amongst many different IPv4 notations
ii  python-libxml2                        2.8.0+dfsg1-7+nmu1                   armhf        Python bindings for the GNOME XML library
ii  python-minimal                        2.7.3-4                              all          minimal subset of the Python language (default version)
ii  python-numpy                          1:1.6.2-1.2                          armhf        Numerical Python adds a fast array facility to the Python language
ii  python-pexpect                        2.4-1                                all          Python module for automating interactive applications
ii  python-pip                            1.1-3                                all          alternative Python package installer
ii  python-pkg-resources                  0.6.24-1                             all          Package Discovery and Resource Access using pkg_resources
ii  python-pyalsa                         1.0.25-1                             armhf        Official ALSA Python binding library
ii  python-pyside                         1.1.1-3                              all          Python bindings for Qt4 (big metapackage)
ii  python-pyside.phonon                  1.1.1-3                              armhf        Qt 4 Phonon module - Python bindings
ii  python-pyside.qtcore                  1.1.1-3                              armhf        Qt 4 core module - Python bindings
ii  python-pyside.qtdeclarative           1.1.1-3                              armhf        Qt 4 Declarative module - Python bindings
ii  python-pyside.qtgui                   1.1.1-3                              armhf        Qt 4 GUI module - Python bindings
ii  python-pyside.qthelp                  1.1.1-3                              armhf        Qt 4 help module - Python bindings
ii  python-pyside.qtnetwork               1.1.1-3                              armhf        Qt 4 network module - Python bindings
ii  python-pyside.qtopengl                1.1.1-3                              armhf        Qt 4 OpenGL module - Python bindings
ii  python-pyside.qtscript                1.1.1-3                              armhf        Qt 4 script module - Python bindings
ii  python-pyside.qtsql                   1.1.1-3                              armhf        Qt 4 SQL module - Python bindings
ii  python-pyside.qtsvg                   1.1.1-3                              armhf        Qt 4 SVG module - Python bindings
ii  python-pyside.qttest                  1.1.1-3                              armhf        Qt 4 test module - Python bindings
ii  python-pyside.qtuitools               1.1.1-3                              armhf        Qt 4 UI tools module - Python bindings
ii  python-pyside.qtwebkit                1.1.1-3                              armhf        Qt 4 WebKit module - Python bindings
ii  python-pyside.qtxml                   1.1.1-3                              armhf        Qt 4 XML module - Python bindings
ii  python-rpi.gpio                       0.5.3a-1                             armhf        Python GPIO module for Raspberry Pi
ii  python-setuptools                     0.6.24-1                             all          Python Distutils Enhancements (setuptools compatibility)
ii  python-simplejson                     2.5.2-1                              armhf        simple, fast, extensible JSON encoder/decoder for Python
ii  python-support                        1.0.15                               all          automated rebuilding support for Python modules
ii  python-tk                             2.7.3-1                              armhf        Tkinter - Writing Tk applications with Python
ii  python-yaml                           3.10-4                               armhf        YAML parser and emitter for Python
ii  python-yaml-dbg                       3.10-4                               armhf        YAML parser and emitter for Python (debug build)
ii  python2.6                             2.6.8-1.1                            armhf        Interactive high-level object-oriented language (version 2.6)
ii  python2.6-minimal                     2.6.8-1.1                            armhf        Minimal subset of the Python language (version 2.6)
ii  python2.7                             2.7.3-6                              armhf        Interactive high-level object-oriented language (version 2.7)
ii  python2.7-dbg                         2.7.3-6                              armhf        Debug Build of the Python Interpreter (version 2.7)
ii  python2.7-dev                         2.7.3-6                              armhf        Header files and a static library for Python (v2.7)
ii  python2.7-minimal                     2.7.3-6                              armhf        Minimal subset of the Python language (version 2.7)
pi@scarlettpi ~/dev/scarlettPi/scripts/pi/bin $

Also just to confirm that pocketsphinx is complied correctly against the right libaries:

pi@scarlettpi ~ $ ldd /usr/local/bin/pocketsphinx_continuous 
    /usr/lib/arm-linux-gnueabihf/libcofi_rpi.so (0xb6f9b000)
    libpocketsphinx.so.1 => /usr/local/lib/libpocketsphinx.so.1 (0xb6f5a000)
    libsphinxad.so.0 => /usr/local/lib/libsphinxad.so.0 (0xb6f4e000)
    libsphinxbase.so.1 => /usr/local/lib/libsphinxbase.so.1 (0xb6f07000)
    libpulse.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse.so.0 (0xb6ea8000)
    libpulse-simple.so.0 => /usr/lib/arm-linux-gnueabihf/libpulse-simple.so.0 (0xb6e9c000)
    libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0xb6e7d000)
    libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0xb6e0c000)
    libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0xb6cdd000)
    libjson.so.0 => /lib/arm-linux-gnueabihf/libjson.so.0 (0xb6ccd000)
    libpulsecommon-2.0.so => /usr/lib/arm-linux-gnueabihf/pulseaudio/libpulsecommon-2.0.so (0xb6c6b000)
    libdbus-1.so.3 => /lib/arm-linux-gnueabihf/libdbus-1.so.3 (0xb6c29000)
    libcap.so.2 => /lib/arm-linux-gnueabihf/libcap.so.2 (0xb6c1e000)
    librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0xb6c0f000)
    libdl.so.2 => /lib/arm-linux-gnueabihf/libdl.so.2 (0xb6c04000)
    libgcc_s.so.1 => /lib/arm-linux-gnueabihf/libgcc_s.so.1 (0xb6bdb000)
    /lib/ld-linux-armhf.so.3 (0xb6fa8000)
    libX11-xcb.so.1 => /usr/lib/arm-linux-gnueabihf/libX11-xcb.so.1 (0xb6bd2000)
    libX11.so.6 => /usr/lib/arm-linux-gnueabihf/libX11.so.6 (0xb6abe000)
    libxcb.so.1 => /usr/lib/arm-linux-gnueabihf/libxcb.so.1 (0xb6a9f000)
    libICE.so.6 => /usr/lib/arm-linux-gnueabihf/libICE.so.6 (0xb6a82000)
    libSM.so.6 => /usr/lib/arm-linux-gnueabihf/libSM.so.6 (0xb6a73000)
    libXtst.so.6 => /usr/lib/arm-linux-gnueabihf/libXtst.so.6 (0xb6a67000)
    libwrap.so.0 => /lib/arm-linux-gnueabihf/libwrap.so.0 (0xb6a57000)
    libsndfile.so.1 => /usr/lib/arm-linux-gnueabihf/libsndfile.so.1 (0xb69ee000)
    libasyncns.so.0 => /usr/lib/arm-linux-gnueabihf/libasyncns.so.0 (0xb69e2000)
    libattr.so.1 => /lib/arm-linux-gnueabihf/libattr.so.1 (0xb69d4000)
    libXau.so.6 => /usr/lib/arm-linux-gnueabihf/libXau.so.6 (0xb69ca000)
    libXdmcp.so.6 => /usr/lib/arm-linux-gnueabihf/libXdmcp.so.6 (0xb69be000)
    libuuid.so.1 => /lib/arm-linux-gnueabihf/libuuid.so.1 (0xb69b1000)
    libXext.so.6 => /usr/lib/arm-linux-gnueabihf/libXext.so.6 (0xb699b000)
    libXi.so.6 => /usr/lib/arm-linux-gnueabihf/libXi.so.6 (0xb6986000)
    libnsl.so.1 => /lib/arm-linux-gnueabihf/libnsl.so.1 (0xb696a000)
    libFLAC.so.8 => /usr/lib/arm-linux-gnueabihf/libFLAC.so.8 (0xb691f000)
    libvorbisenc.so.2 => /usr/lib/arm-linux-gnueabihf/libvorbisenc.so.2 (0xb67b2000)
    libvorbis.so.0 => /usr/lib/arm-linux-gnueabihf/libvorbis.so.0 (0xb6782000)
    libogg.so.0 => /usr/lib/arm-linux-gnueabihf/libogg.so.0 (0xb6775000)
    libresolv.so.2 => /lib/arm-linux-gnueabihf/libresolv.so.2 (0xb6761000)
pi@scarlettpi ~ $

And if you need to see any information about my microphone ( ps3 eye ):

Had to throw this in pastebin, ran out of room in this post.

http://pastebin.com/gSDZwRHc

Does anyone have any ideas why this isn't working? Please let me know if my question needs any clarification or if I can provide any more information to aid with debugging.

Thanks.


回答1:


So I finally got this guy working.

Couple key things I needed to realize:

1. Even if you're using Pulseaudio on your Raspberry Pi, as long as Alsa is still installed you're still able to use it. ( This might seem like a no brainer to others, but I honestly didn't realize I could still use both of these at the same time ) Hint via (syb0rg).

2. When it comes to sending large amounts of raw audio data ( .wav format in my case ) to Pocketsphinx via Gstreamer, (queues) are your friend.

After messing around with gst-launch-0.10 on the command line for a while I came across something that actually worked:

gst-launch-0.10 alsasrc device=hw:1 ! queue ! audioconvert ! audioresample ! queue ! vader name=vader auto-threshold=true ! pocketsphinx lm=/home/pi/dev/scarlettPi/config/speech/lm/scarlett.lm dict=/home/pi/dev/scarlettPi/config/speech/dict/scarlett.dic hmm=/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k name=listener ! fakesink dump=1

So what's happening here?

  • Gstreamer is listening to device hw:1 ( Which is my Ps3 Eye USB device ). This device might vary, you can determine this by running :
pi@scarlettpi ~ $ pacmd dump
Welcome to PulseAudio! Use "help" for usage information.

....

load-module module-alsa-card device_id="0" name="platform-bcm2835_AUD0.0"

card_name="alsa_card.platform-bcm2835_AUD0.0" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

load-module module-udev-detect

load-module module-bluetooth-discover

load-module module-esound-protocol-unix

load-module module-native-protocol-unix

load-module module-gconf

load-module module-default-device-restore

load-module module-rescue-streams

load-module module-always-sink

load-module module-intended-roles

load-module module-console-kit

load-module module-systemd-login

load-module module-position-event-sounds

load-module module-role-cork

load-module module-filter-heuristics

load-module module-filter-apply

load-module module-dbus-protocol

load-module module-switch-on-port-available

load-module module-cli-protocol-unix

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no

deferred_volume=yes card_properties="module-udev-detect.discovered=1"

....

The important line to notice is:

load-module module-alsa-card device_id="1" name="usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" card_name="alsa_card.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01-CameraB409241" namereg_fail=false tsched=yes fixed_latency_range=no ignore_dB=no deferred_volume=yes card_properties="module-udev-detect.discovered=1"

Thats my Playstation 3 Eye, and thats on device_id=1. Hence hw:1

  • The audio data coming in from the ps3 eye gets resampled and added to a gstreamer queue and has to pass through a (vader) element before moving on to pocketsphinx. By passing the audio through the vader element w/ the auto-threshold=true flag on, gstreamer can determine the background noise level, which can be important if you have a lousy soundcard or a far-field microphone. This is how the pocketsphinx element will know when an utterance starts and ends.

  • Add the regular pocketsphix arguments to the pipeline that we already determined (here).

  • Pass everything into a fakesink since we don't need to hear anything right now, we only need pocketsphinx to listen to everything. The dump=1 flag provides us with more debugging information to see what's being processed / if audio is being accepted at all.

** After getting that to run successfully, the new python code looks like this: **

self.pipeline = gst.parse_launch(' ! '.join(['alsasrc device=' + scarlett_config.gimmie('audio_input_device'),
                                           'queue',
                                           'audioconvert',
                                           'audioresample',
                                           'queue',
                                           'vader name=vader auto-threshold=true',
                                           'pocketsphinx lm=' + scarlett_config.gimmie('LM') + ' dict=' + scarlett_config.gimmie('DICT') + ' hmm=' + scarlett_config.gimmie('HMM') + ' name=listener',
                                           'fakesink dump=1']))

Hope this helps someone.

NOTE: Please excuse me if my Gstreamer pipline is using excessive elements. I'm fairly new to Gstreamer, and i'm opener to more efficient ways of doing this.



来源:https://stackoverflow.com/questions/18087720/python-having-trouble-accessing-usb-microphone-using-gstreamer-to-perform-speech

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!