How to scrape views from Youtube pages

拥有回忆 提交于 2021-02-11 14:21:22

问题


My code is good for the most part

I currently get all the titles from a youtube page + do a scroll.

How would I get the number of views?

Would CSS or xPath work?

import time
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())
url='https://www.youtube.com/user/OakDice/videos'
driver.get(url)
last_height = driver.execute_script("return document.documentElement.scrollHeight")
SCROLL_PAUSE_TIME = 2
while True:
    # Scroll down to bottom
    time.sleep(2)
    driver.execute_script("window.scrollTo(0, arguments[0]);", last_height)
    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)


    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.documentElement.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height
content=driver.page_source.encode('utf-8').strip()
soup=BeautifulSoup(content,'lxml')
titles = soup.findAll('a', id='video-title')
for title in titles:
    print(title.text)

回答1:


It would probably be more robust to use the YouTube API to get JSON data about the videos. You can get a list of all public videos uploaded by a given user (see for instance YouTube API to fetch all videos on a channel), and then you use the videos API to get the statistics for each video in the playlist and get the view count from statistics.viewCount.




回答2:


I would loop through all the videos (parent tag ytd-grid-video-renderer) and then pluck out the titles & counts from there.

Something like:

allvideos = driver.find_element_by_tag_name('driytd-grid-video-renderer')
for video in allvideos:
    title = video.find_element_by_id('video-title')
    count = video.find_element_by_xpath('//*[@id='metadata-line']/span')
    print (title, count)

I don't have a beautiful soup solution for you, as selenium will do most of the work for you.

And a word of caution on using driver.page_source, it doesn't really return a full snapshot of the DOM, so it probably isn't doing what you think it's doing.



来源:https://stackoverflow.com/questions/65008223/how-to-scrape-views-from-youtube-pages

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!