This is sort of a follow-up question to one I asked earlier.
I\'m trying to scrape a webpage which I have to login to reach first. But after authentication, the web
I don't think Splash alone would handle this particular case well.
Here is the working idea:
PhantomJS
into Scrapy
The code:
import scrapy
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class BboSpider(scrapy.Spider):
name = "bbo"
allowed_domains = ["bridgebase.com"]
login_page = "http://www.bridgebase.com/myhands/myhands_login.php?t=%2Fmyhands%2Findex.php%3F"
def start_requests(self):
driver = webdriver.PhantomJS()
driver.get(self.login_page)
driver.find_element_by_id("username").send_keys("user")
driver.find_element_by_id("password").send_keys("password")
driver.find_element_by_name("submit").click()
driver.save_screenshot("test.png")
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.LINK_TEXT, "Click here for results of recent tournaments")))
cookies = driver.get_cookies()
driver.close()
yield scrapy.Request("http://www.bridgebase.com/myhands/index.php", cookies=cookies)
def parse(self, response):
if "recent tournaments" in response.body:
self.log("Login successful")
else:
self.log("Login failed")
print(response.body)
Prints Login successful
and the HTML of the "hands" page.