Scroll over website using phatomJS and selenium

旧城冷巷雨未停 提交于 2019-12-11 05:46:23

问题


I need to scroll over a web page (example twitter) an make a web scraping of the new elements that appear as one advances on the website. I try to make this using python 3.x, selenium and PhantomJS. This is my code

import time
from selenium import webdriver
from bs4 import BeautifulSoup

user = 'ciroylospersas'
# Start web browser
#browser = webdriver.Firefox()
browser = webdriver.PhantomJS()
browser.set_window_size(1024, 768)
browser.get("https://twitter.com/")

# Fill username in login
element = browser.find_element_by_id("signin-email")
element.clear()
element.send_keys('your twitter user')
# Fill password in login
element = browser.find_element_by_id("signin-password")
element.clear()
element.send_keys('your twitter pass')

browser.save_screenshot('screen.png') # save a screenshot to disk

# Summit the login
element.submit()
time.sleep(5

browser.save_screenshot('screen1.png') # save a screenshot to disk
# Move to the following url
browser.get("https://twitter.com/" + user + "/following")
browser.save_screenshot('screen2.png') # save a screenshot to disk

scroll_script = "var h = document.body.scrollHeight; window.scrollTo(0, h); return h;"
newHeight = browser.execute_script(scroll_script)
print(newHeight)
browser.save_screenshot('screen3.png') # save a screenshot to disk

The problem is I can't scroll to the bottom. The screen2.png and screen3.png are the same. But if I change the webdriver from PhantomJS to Firefox the same code work fine. Why?


回答1:


I was able to get this to work in phantomJS when trying to solve a similar problem:

check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)
    height = driver.execute_script("return document.body.scrollHeight;")
    if height == check_height:
        break
    check_height = height

It will scroll to the current "bottom", wait, see if the page loaded more, and bail if it did not (assuming everything got loaded if the heights match.)

In my original code I had a "max" value I checked alongside the matching heights because I was only interested in the first 10 or so "pages". If there were more I wanted it to stop loading and skip them.

Also, this is the answer I used as an example



来源:https://stackoverflow.com/questions/40369932/scroll-over-website-using-phatomjs-and-selenium

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!