How to use Google's Text-to-Speech service for Chinese characters on Android?

不打扰是莪最后的温柔 提交于 2019-12-22 05:01:19

问题


I'm trying to pull an audio file from google's text-to-speech function. Basically, you toss in the link and then concat whatever you want to be spoken at the end of it. I've gotten the below code to work just fine for English, so I think the problem must be how the Chinese characters are getting encoded in the request. Here's what I've got:

String text = "text to be spoken";
public static final String AUDIO_CHINESE= "http://www.translate.google.com/translate_tts?tl=zh&q=";
public static final String AUDIO_ENGLISH = "http://www.translate.google.com/translate_tts?tl=en&q=";

URL url = new URL(AUDIO_ENGLISH + text);

urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.setRequestProperty("Accept-Charset", Variables.UTF_8);

if (urlConnection.getResponseCode() ==200) {
     //get byte array in response
     in = new DataInputStream(urlConnection.getInputStream());
} else {
     in = new DataInputStream(urlConnection.getErrorStream());
}
//use commons io
byte[] bytes = IOUtils.toByteArray(in);

in.close();
urlConnection.disconnect();

return bytes;

When I try this with Chinese characters, though, it returns something that I can't get to play in the mediaplayer (I suspect it's not a proper audio file as the vast majority of bytes are '85'). So I've tried both

String chText = "你好";
URL url = new URL(AUDIO_CHINESE + URLEncoder.encode(chText, "UTF-8));

and

URL url = new URL(AUDIO_CHINESE + Uri.encode(chText, "UTF-8"));

and then adding

urlConnection.setRequestProperty("content-type", "application/x-www-form-urlencoded; charset=UTF-8");

to the request header. This just made it worse, though, because now it doesn't even return a 200 code, instead stating "FileNotFound" in logcat.

So on a whim, I went back and tried the URL/Uri encoding with the English text, and now the English won't return a valid result either. Not sure what's going on here: the raw url in the debugger works fine if I copy and paste into Chrome, but for some reason the urlConnection just doesn't work. Feel like I'm missing something obvious.

EDIT

Fiddling with it some more has revealed no answer, just more confusion (and exasperation). For some reason, when sent over httpurlconnection, the Google tts machine reads the utf-8 percent-encoded text as utf-16, at least as far as I can tell. For example, the character "維" (wei2) is %E7%B6%AD, but if you pass it through the connection, you'll get a file that pronounces "see" ("ç", to be precise).

ç, as it turns out, is 0x00E7 in UTF-16 (its utf-8 percent-encoded version is %C3%A7). I have no idea why it does that in Java, because putting the appropriate % at the end of the link in any browser will work properly. Thus far, I have tried various combinations of trying to get the tts to read the entirety of %E7%B6%AD without much success.

EDIT2

Solution to my problem found! See below for answer. The problem wasn't in the encoding, it was in the parsing on Google's end. Have edited the title accordingly. Cheers!


回答1:


So, as it turns out, the problem at the end wasn't the encoding at all; it was the processing at Google's end. To get the service to correctly recognize UTF-8, you need to use this link http://www.translate.google.com/translate_tts?ie=utf-8&tl=zh-cn&q= instead of the one above. Note the ie=utf-8 added to the parameter. So you can just URLEncoder.encode("你好嗎", "UTF-8"), append it to the link, and send it up as per usual. Whew!



来源:https://stackoverflow.com/questions/28166813/how-to-use-googles-text-to-speech-service-for-chinese-characters-on-android

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!