Google Translate TTS API blocked

前端 未结 5 2023
难免孤独
难免孤独 2020-11-30 02:37

Google implemented a captcha to block people from accessing the TTS translate API https://translate.google.com/translate_tts?ie=UTF-8&q=test&tl=zh-TW. I was using it

5条回答
  •  佛祖请我去吃肉
    2020-11-30 03:06

    First, to avoid captcha, you have to set a proper user-agent like:
    "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"

    Then to not being blocked you must provide a proper token ("tk" get parameter) for each single request.
    On the web you can find many different kind of scripts that try to calculate the token after a lot of reverse engineering...but every time the big G change the algorithm you're stuck again, so it's much easier to retrieve your token just observing in deep similar requests to translate page (with your text in the url).
    You can read the token time by time grepping "tk=" from the output of this simple code with phantomjs:

    "use strict";
    var page = require('webpage').create();
    var system = require('system');
    var args = system.args;
    if (args.length != 2) { console.log("usage: "+args[0]+" text");  phantom.exit(1); }
    page.onConsoleMessage = function(msg) {     console.log(msg); };
    page.onResourceRequested = function(request) {   console.log('Request ' + JSON.stringify(request, undefined, 4)); };
    page.open("https://translate.google.it/?hl=it&tab=wT#fr/it/"+args[1],     function(status) {
    if (status === "success")    {             phantom.exit(0);           } 
    else {      phantom.exit(1);    }
    });
    

    so in the end you can get your speech with something like:
    wget -U "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0" "http://translate.google.com/translate_tts?ie=UTF-8&tl=it&tk=52269.458629&q=ciao&client=t" -O ciao.mp3
    (token are probably time based so this link may not work tomorrow)

提交回复
热议问题