Sphinx 4 Failed to align audio to trancript

吃可爱长大的小学妹 提交于 2020-01-05 04:18:07

问题


I am following Acoustic Model Adaption using Sphinx 4 with the following wav files. Here is the result I get when using bw -hmmdir wsj -moddeffn wsj/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn vn.dic -ctlfn lisp.fileids -lsnfn lisp.transcription -accumdir .

utt>     0                 lisp_0001   53INFO: cmn.c(175): CMN: 73.43  2.89 -0.3
4 -1.85 -0.98 -0.52  0.33  0.67 -0.77 -0.56  0.18 -0.50 -0.30
    0    28 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0001 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     1                 lisp_0002   41INFO: cmn.c(175): CMN: 74.39  2.48 -0.9
2 -2.09 -1.31 -0.52 -0.17  0.67  0.26 -0.62  0.34 -0.26 -0.04
    0    28 0 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0002 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     2                 lisp_0003   57INFO: cmn.c(175): CMN: 75.86  2.02 -0.5
3 -1.16 -0.79 -0.55 -0.77  0.92 -0.34 -0.82  0.63 -0.33 -0.60
    0    40 2 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0003 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     3                 lisp_0004   57INFO: cmn.c(175): CMN: 74.78  2.01 -0.3
6 -0.52 -1.04 -1.08 -0.08  0.88 -0.51 -0.65  0.56 -0.36 -0.54
    0    40 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0004 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     4                 lisp_0005   49INFO: cmn.c(175): CMN: 75.03  1.80 -1.7
6 -1.18 -1.56 -1.24  0.62  1.84 -0.58 -1.34  0.64 -0.26 -0.20
    0    28 2 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0005 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     5                 lisp_0006   41INFO: cmn.c(175): CMN: 76.75  0.51 -0.5
5 -0.89 -1.18 -1.16  0.64  1.67 -1.25 -1.30  0.57 -0.26 -0.54
    0    28 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0006 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     6                 lisp_0007   22INFO: cmn.c(175): CMN: 82.68 -5.14 -4.5
6 -1.20 -0.66 -0.34 -0.88 -0.05  1.29  1.60  0.97 -0.68 -1.65
    0    28 0 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0007 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     7                 lisp_0008   16INFO: cmn.c(175): CMN: 82.76 -6.31 -4.7
5 -1.98 -1.04 -1.06 -0.49  1.19  1.57  1.48  0.52 -1.17 -1.32
    0    28 0 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0008 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     8                 lisp_0009   47INFO: cmn.c(175): CMN: 78.49  1.93 -0.6
9 -0.95 -1.04 -0.06 -0.18  0.98 -0.98 -0.72  0.20  0.04 -0.54
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0009 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>     9                 lisp_0010   47INFO: cmn.c(175): CMN: 77.21  1.23  0.1
8 -0.83 -0.89 -0.19 -0.39  0.80 -1.13 -0.86  0.38 -0.17 -0.47
    0    32 3 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0010 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    10                 lisp_0011   39INFO: cmn.c(175): CMN: 79.15  0.97  0.4
2 -0.53 -1.72 -1.64 -0.36  1.03  0.23 -0.49 -0.59 -0.21 -0.16
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0011 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    11                 lisp_0012   41INFO: cmn.c(175): CMN: 77.22  1.29  0.4
5 -0.51 -2.12 -1.20 -0.52  1.09 -0.10 -0.56 -0.27 -0.60 -0.20
    0    36 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0012 ignored
 utt 0.038x 0.320e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    12                 lisp_0013   49INFO: cmn.c(175): CMN: 78.72  0.88 -0.8
3 -0.17 -0.09 -0.18 -1.40  0.71 -0.16 -1.00 -0.03  0.07 -0.35
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0013 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    13                 lisp_0014   51INFO: cmn.c(175): CMN: 77.42  0.56 -0.8
8 -0.14 -0.05 -0.20 -1.31  0.90 -0.21 -1.39  0.07  0.01 -0.28
    0    32 3 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0014 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    14                 lisp_0015   57INFO: cmn.c(175): CMN: 74.21  1.50  0.0
8 -1.50 -1.63 -0.97  0.65  0.63 -0.30 -0.07 -0.25 -0.71 -0.21
    0    28 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0015 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    15                 lisp_0016   54INFO: cmn.c(175): CMN: 74.42  1.22  0.0
1 -1.77 -1.29 -1.20  0.30  0.83  0.39 -0.31 -0.32 -0.61 -0.11
    0    28 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0016 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    16                 lisp_0017   51INFO: cmn.c(175): CMN: 77.04  1.26 -1.0
4 -0.57 -0.62 -0.27 -0.04  0.25 -0.97 -0.66  0.42 -0.16 -0.32
    0    32 2 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0017 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    17                 lisp_0018   53INFO: cmn.c(175): CMN: 76.83  0.69 -1.3
5 -0.93 -0.46 -0.01 -0.53  0.61 -0.64 -1.03  0.85 -0.18 -0.15
    0    32 2 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0018 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    18                 lisp_0019   55INFO: cmn.c(175): CMN: 79.39  0.58 -0.5
1 -1.02 -1.71 -0.55  0.44  0.80  0.32 -0.67 -0.73 -0.09 -0.21
    0    36 2 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0019 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    19                 lisp_0020   53INFO: cmn.c(175): CMN: 77.16  1.12 -0.4
3 -1.27 -1.72 -1.32 -0.06  0.98  0.63 -0.42 -0.39 -0.03 -0.32
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0020 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    20                 lisp_0023   43INFO: cmn.c(175): CMN: 78.04  1.22 -1.2
4 -1.15 -0.43 -0.20 -0.23  0.78 -0.33 -0.37  0.05 -0.60 -0.73
    0    24 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0023 ignored
 utt 0.036x 0.256e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    21                 lisp_0024   54INFO: cmn.c(175): CMN: 77.16  0.71 -1.2
7 -0.87 -0.57 -0.45  0.12  0.53  0.63 -0.43  0.26 -0.65 -0.38
    0    24 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0024 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    22                 lisp_0025   53INFO: cmn.c(175): CMN: 74.48  3.43  2.1
5 -0.18 -1.62 -0.61 -0.64 -0.19 -0.28  0.38  0.05 -0.40 -0.01
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0025 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

utt>    23                 lisp_0026   33INFO: cmn.c(175): CMN: 69.14  5.30  2.1
9 -1.26 -2.55  0.65 -1.46  0.14 -0.22 -0.54  0.24 -0.34 -0.19
    0    32 1 ERROR: "backward.c", line 421: Failed to align audio to trancript:
 final state of the search is not reached
ERROR: "baum_welch.c", line 324: lisp_0026 ignored
 utt 0.000x 0.000e upd 0.000x 0.000e fwd 0.000x 0.000e bwd 0.000x 0.000e gau 0.0
00x 0.000e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e

overall> stats 0 (-0) 0.000000e+000 0.000000e+000 0.000x 4.576e
WARNING: "accum.c", line 617: Over 500 senones never occur in the input data. Th
is is normal for context-dependent untied senone training or for adaptation, but
 could indicate a serious problem otherwise.
INFO: s3mixw_io.c(232): Wrote ./mixw_counts [4147x1x8 array]
INFO: s3tmat_io.c(174): Wrote ./tmat_counts [49x3x4 array]
INFO: s3gau_io.c(478): Wrote ./gauden_counts with means with vars [4147x1x8 vect
or arrays]
INFO: main.c(1014): Counts saved to .

I suspect that since my WAV files do not have any silence pad, these errors above occurs. Is that correct? If not, what is the cause of the error?

Note: I recorded a long audio file contains all of the words then cut them into words. It is due to the slow saving speed of my record device. Does that hamper with the quality of each smaller files?

Thanks in advance


回答1:


Your audio files are recoreded at 44.1khz:

 file lisp_0009.wav
 lisp_0009.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 44100 Hz

Sphinxtrain requires audio at 16khz rate, you can resample your audio using sox:

 for f in *.wav; do 
     sox $f -r 16000 $f.new.wav; mv $f.new.wav $f; 
 done

For more information on input audio format see CMUSphinx adaptation tutorial

http://cmusphinx.sourceforge.net/wiki/tutorialadapt

I suspect that since my WAV files do not have any silence pad,

This is an issue too. Audio must have around 0.25s of silence on boundaries.

then cut them into words.

You need to cut files on utterances, not on words



来源:https://stackoverflow.com/questions/20233480/sphinx-4-failed-to-align-audio-to-trancript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!