问题
I need to scroll over a web page (example twitter) an make a web scraping of the new elements that appear as one advances on the website. I try to make this using python 3.x
, selenium
and PhantomJS
. This is my code
import time
from selenium import webdriver
from bs4 import BeautifulSoup
user = 'ciroylospersas'
# Start web browser
#browser = webdriver.Firefox()
browser = webdriver.PhantomJS()
browser.set_window_size(1024, 768)
browser.get("https://twitter.com/")
# Fill username in login
element = browser.find_element_by_id("signin-email")
element.clear()
element.send_keys('your twitter user')
# Fill password in login
element = browser.find_element_by_id("signin-password")
element.clear()
element.send_keys('your twitter pass')
browser.save_screenshot('screen.png') # save a screenshot to disk
# Summit the login
element.submit()
time.sleep(5
browser.save_screenshot('screen1.png') # save a screenshot to disk
# Move to the following url
browser.get("https://twitter.com/" + user + "/following")
browser.save_screenshot('screen2.png') # save a screenshot to disk
scroll_script = "var h = document.body.scrollHeight; window.scrollTo(0, h); return h;"
newHeight = browser.execute_script(scroll_script)
print(newHeight)
browser.save_screenshot('screen3.png') # save a screenshot to disk
The problem is I can't scroll to the bottom. The screen2.png
and screen3.png
are the same. But if I change the webdriver
from PhantomJS
to Firefox
the same code work fine. Why?
回答1:
I was able to get this to work in phantomJS when trying to solve a similar problem:
check_height = driver.execute_script("return document.body.scrollHeight;")
while True:
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
height = driver.execute_script("return document.body.scrollHeight;")
if height == check_height:
break
check_height = height
It will scroll to the current "bottom", wait, see if the page loaded more, and bail if it did not (assuming everything got loaded if the heights match.)
In my original code I had a "max" value I checked alongside the matching heights because I was only interested in the first 10 or so "pages". If there were more I wanted it to stop loading and skip them.
Also, this is the answer I used as an example
来源:https://stackoverflow.com/questions/40369932/scroll-over-website-using-phatomjs-and-selenium