Text to speech pronouncing numbers like “4th”, “8ths”, or “2nd”

问题

A while back I wrote some code that would convert a Double into a String, where the string was formatted as a readable fraction.

For an example

4.75 => "4 and 3 4ths"
1.5 => "1 and 1 half"
1.33 => "1 and 1 3rd"

The majority of numbers are pronounced as intended with a few notable exceptions. Instead of the text "4ths" being pronounced as "fourths" it is pronounced "four tee ache ess". Here is an example demonstrating this.

//this works
tts.speak("1 and 3 fourths", TextToSpeech.QUEUE_FLUSH, null);    
//this works
tts.speak("1 and 1 3rd", TextToSpeech.QUEUE_FLUSH, null); 
//this works
tts.speak("1 and 1 4th", TextToSpeech.QUEUE_FLUSH, null); 

//this does not work
tts.speak("1 and 3 4ths", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4thes", TextToSpeech.QUEUE_FLUSH, null);
//this does not work
tts.speak("1 and 3 4th-s", TextToSpeech.QUEUE_FLUSH, null);

The strangest this is that this worked fine about a year back when I first wrote the code, the "ths" postfix was pronounced as one might expect. Perhaps I am mistaken on that point...

Regardless, the issue seems to be that numbers followed by 2 letters are read like a complete word, while numbers followed by 3 or more are read like a series of digits instead. I could add to the complexity of the algorithm by substituting all the numbers for their word counterparts however the longer I work at this the more I begin to think that I am reinventing the wheel. The API did not seem to denote a way of specifying pronunciation for the speak() method. Am I missing something?

回答1:

This sort of behavior is going to vary between TextToSpeech Engines -- the Google TTS engine, for example, will behave differently than, say, the SVOX PICO (emulator < API 24) engine... so it's not your fault how each engine behaves slightly differently... and if there are any pronunciation controls, then the engine is responsible for supplying them directly to the end user via settings.

You're probably just testing on a different engine than you were before... or even an update to the same engine.

You could just test some major engines like Samsung, Google, and PICO and try to find a common denominator of behavior. I suspect that you're right: spelling out the words is the best option in this case.

You can specify what engine you want to use as the last argument (String) of the TextToSpeech constructor, and you can see what engines are installed on any particular device by going to (home\settings\language&locale\TTS) or in code like this:

private ArrayList<String> whatEnginesAreInstalled(Context context) {
    final Intent ttsIntent = new Intent();
    ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);
    final PackageManager pm = context.getPackageManager();
    final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);
    ArrayList<String> installedEngineNames = new ArrayList<>();
    for (ResolveInfo r : list) {
        String engineName = r.activityInfo.applicationInfo.packageName;
        installedEngineNames.add(engineName);

        // just logging the version number out of interest
        String version = "null";
        try {
            version = pm.getPackageInfo(engineName,
            PackageManager.GET_META_DATA).versionName;
            } catch (Exception e) {
                Log.i("XXX", "try catch error");
            }
        Log.i("XXX", "we found an engine: " + engineName);
        Log.i("XXX", "version: " + version);
    }
    return installedEngineNames;
}

回答2:

As Boober Bunz explained, these features vary from one engine to another. It might get changed with newer versions of engine as well. I would suggest the best option will be to convert everything to words, like Fourths to make it consistent across engines. For a quick fix you can try 4th's as it seems to be more valid word than others you mentioned not working.

来源：https://stackoverflow.com/questions/52392804/text-to-speech-pronouncing-numbers-like-4th-8ths-or-2nd

标签

android

text-to-speech