问题
I am using code provided below to create a list containing titles of videos in a public YouTube playlist. It works well for playlists containing less than 100 videos. For playlists containing more than 100 videos, titles of first 100 videos in the playlist will be added to the list. I think reason behind this behaviour is because when we load the same page in browser, first 100 videos are loaded. Remaining videos are loaded as you scroll down the page. Is there any way to get titles of all videos from a playlist?
from bs4 import BeautifulSoup as bs
import requests
url = "https://www.youtube.com/playlist?list=PLRdD1c6QbAqJn0606RlOR6T3yUqFWKwmX"
r = requests.get(url)
soup = bs(r.text,'html.parser')
res = soup.find_all('tr',{'class':'pl-video yt-uix-tile'})
titles = []
for video in res:
titles.append(video.get('data-title'))
回答1:
As you have seen correctly only the first 100 Videos are loaded. When the user scrolls down ajax calls are made to load the additional videos.
The easiest, but also most heavywheigt option to reproduce the ajax calls is to use selenium webdriver. You can find the official python documentation here.
回答2:
I created following script with the help of inputs from Abrogans.
Also this gist was helpful.
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Firefox()
url = "https://www.youtube.com/playlist?list=PLRdD1c6QbAqJn0606RlOR6T3yUqFWKwmX"
driver.get(url)
elem = driver.find_element_by_tag_name('html')
elem.send_keys(Keys.END)
time.sleep(3)
elem.send_keys(Keys.END)
innerHTML = driver.execute_script("return document.body.innerHTML")
page_soup = bs(innerHTML, 'html.parser')
res = page_soup.find_all('span',{'class':'style-scope ytd-playlist-video-renderer'})
titles = []
for video in res:
if video.get('title') != None:
titles.append((video.get('title')))
driver.close()
来源:https://stackoverflow.com/questions/55992902/python-script-to-create-a-list-of-video-titles-of-a-youtube-playlist-containing