How to scrape views from Youtube pages

问题

My code is good for the most part

I currently get all the titles from a youtube page + do a scroll.

How would I get the number of views?

Would CSS or xPath work?

import time
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url='https://www.youtube.com/user/OakDice/videos'
driver.get(url)
last_height = driver.execute_script("return document.documentElement.scrollHeight")
SCROLL_PAUSE_TIME = 2
while True:
    # Scroll down to bottom
    time.sleep(2)
    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)


    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
content=driver.page_source.encode('utf-8').strip()
soup=BeautifulSoup(content,'lxml')
titles = soup.findAll('a', id='video-title')
for title in titles:
    print(title.text)

回答1:

It would probably be more robust to use the YouTube API to get JSON data about the videos. You can get a list of all public videos uploaded by a given user (see for instance YouTube API to fetch all videos on a channel), and then you use the videos API to get the statistics for each video in the playlist and get the view count from statistics.viewCount.

回答2:

I would loop through all the videos (parent tag ytd-grid-video-renderer) and then pluck out the titles & counts from there.

Something like:

allvideos = driver.find_element_by_tag_name('driytd-grid-video-renderer')
for video in allvideos:
    title = video.find_element_by_id('video-title')
    count = video.find_element_by_xpath('//*[@id='metadata-line']/span')
    print (title, count)

I don't have a beautiful soup solution for you, as selenium will do most of the work for you.

And a word of caution on using driver.page_source, it doesn't really return a full snapshot of the DOM, so it probably isn't doing what you think it's doing.

来源：https://stackoverflow.com/questions/65008223/how-to-scrape-views-from-youtube-pages

标签

python

selenium

xpath

css-selectors