Can't download video captions using youtube API v3 in python

旧巷老猫 提交于 2019-11-30 10:31:06

Your app seems overly-complex... it's structured to be able to do everything that can be done w/captions, not just download. That makes it harder to debug, so I wrote an abridged (Python 2 or 3) version that just downloads captions:

# Usage example: $ python captions-download.py Txvud7wPbv4

from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/youtube.force-ssl'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
YOUTUBE = discovery.build('youtube', 'v3', http=creds.authorize(Http()))

def process(vid):
    caption_info = YOUTUBE.captions().list(
            part='id', videoId=vid).execute().get('items', [])
    caption_str = YOUTUBE.captions().download(
            id=caption_info[0]['id'], tfmt='srt').execute()
    caption_data = caption_str.split('\n\n')
    for line in caption_data:
        if line.count('\n') > 1:
            i, cap_time, caption = line.split('\n', 2)
            print('%02d) [%s] %s' % (
                    int(i), cap_time, ' '.join(caption.split())))

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        VID = sys.argv[1]
    process(VID)

The way it works is this:

  1. You pass in the video ID (VID) as the only argument (sys.argv[1])
  2. It uses that VID to look up the caption IDs with YOUTUBE.captions().list()
  3. Assuming the video has (at least) one caption track, I grab its ID (caption_info[0]['id'])
  4. Then it calls YOUTUBE.captions().download() with that caption ID requesting the srt track format
  5. All individual captions are delimited by double NEWLINEs, so split on 'em
  6. Loop through each caption; there's data if there are at least 2 NEWLINEs in the line, so only split() on the 1st pair
  7. Display the caption#, timeline of when it appears, then the caption itself, changing all remaining NEWLINEs to spaces

When I run it, I get the expected result... here on a video I own:

$ python captions-download.py MY_VIDEO_ID
01) [00:00:06,390 --> 00:00:09,280] iterator cool but that's cool
02) [00:00:09,280 --> 00:00:12,280] your the moment
03) [00:00:13,380 --> 00:00:16,380] and sellers very thrilled
    :

Couple of things...

  1. I think you need to be the owner of the video you're trying to download the captions for.
    • I tried my script on your video, and I get a 403 HTTP Forbidden error
    • Here are other errors you may get from the API
  2. In your case, it looks like something is messing up the video ID you're passing in.
    • It thinks you're giving it <code> and </code> (notice the hex 0x3c & 0x3e values)... rich text?
    • Anyway, this is why I wrote my own, shorter version... so I have a more controlled environment to experiment.

FWIW, since you're new to using Google APIs, I've made a couple of intro videos I made to get developers on-boarded with using Google APIs in this playlist. The auth code is the toughest, so focus on videos 3 and 4 in that playlist to help get you acclimated.

I don't really have any videos that cover YouTube APIs (as I focus more on G Suite APIs) although I do have the one Google Apps Script example (video 22 in playlist); if you're new to Apps Script, you need to review your JavaScript then check out video 5 first. Hope this helps!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!