sphinx-4 aligner skips plain words like `you`, `in` and words with dashes - why?

安稳与你 提交于 2019-12-13 00:52:42

问题


I'm trying to align simple text. Here are the links to text and audio files:
http://s000.tinyupload.com/?file_id=48044768133759453374
http://s000.tinyupload.com/?file_id=99891199139563396901

Here is the configuration settings:

private static final String ACOUSTIC_MODEL_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/en-us";
private static final String DICTIONARY_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";

The output I get is the following (ellipsis are added by me):

- ï
- ¿in
  a                         [11250:11330]
  standard                  [11330:11920]
  shopping                  [11920:12440]
  centre                    [12440:13020]
- you
  can                       [13380:13730]
  ...
  shops                     [15170:15790]
- you
  can                       [16620:16890]
  buy                       [16890:17140]
  ...
  and                       [26920:27230]
  suits                     [27190:27220]
- there’s
  a                         [29160:29210]
  sportswear                [29210:29980]
  ...
  clothes                   [33330:33360]
- t-shirts
  shorts                    [35560:36320]
  jumpers                   [36630:37410]
  ...
  for                       [41860:42010]

As you can see for some reason it:

  • didn't recognize in before the first a
  • no timing for multiple instances of you
  • didn't recognize there's, instead it identified it as there’s
  • no timing for words with dashes, like t-shirts

Is there any way I can configure sphinx to provide timings for there occurrences?


回答1:


Some comments

didn't recognize in before the first a

Your text file has BOM mark which is uknown to aligner. It is better to remove it before alignment

didn't recognize there's, instead it identified it as there’s

Your text uses UTF-8 apostrophes which are unknown to aligner. You should better convert them to ASCII equivalent

no timing for words with dashes, like t-shirts

Those words are missing in the dictionary. You can add them to the dictionary before alignment or specify g2p model to convert them to phonetics.



来源:https://stackoverflow.com/questions/29989840/sphinx-4-aligner-skips-plain-words-like-you-in-and-words-with-dashes-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!