Change proxy in chromedriver for scraping purposes

前端 未结 1 450
再見小時候
再見小時候 2020-12-22 10:46

I\'m scraping Bet365, probably one of the most tricky websites I\'ve encountered, with selenium and Chrome. The issue with this page is that, even though my scraper takes s

相关标签:
1条回答
  • 2020-12-22 11:16

    I don't see any significant issue either in your approach or your code block. However, another approach would be to make use of all the proxies marked with in the Last Checked column which gets updated within the Free Proxy List.

    As a solution you can write a script to grab all the proxies available and create a List dynamically every time you initialize your program. The following program will invoke a proxy from the Proxy List one by one until a successful proxied connection is established and verified through the Page Title of https://www.bet365.es to contain the text bet365. An exception may arise because the free proxy which your program grabbed was overloaded with users trying to get their proxy traffic through.

    • Code Block:

      driver.get("https://sslproxies.org/")
      driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//th[contains(., 'IP Address')]"))))
      ips = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//tbody//tr[@role='row']/td[position() = 1]")))]
      ports = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='table table-striped table-bordered dataTable']//tbody//tr[@role='row']/td[position() = 2]")))]
      driver.quit()
      proxies = []
      for i in range(0, len(ips)):
          proxies.append(ips[i]+':'+ports[i])
      print(proxies)
      for i in range(0, len(proxies)):
          try:
              print("Proxy selected: {}".format(proxies[i]))
              options = webdriver.ChromeOptions()
              options.add_argument('--proxy-server={}'.format(proxies[i]))
              driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
              driver.get("https://www.bet365.es")
              if "Proxy Type" in WebDriverWait(driver, 20).until(EC.title_contains("bet365")):
                  # Do your scrapping here
                  break
          except Exception:
              driver.quit()
      print("Proxy was Invoked")
      
    • Console Output:

      ['190.7.158.58:39871', '175.139.179.65:54980', '186.225.45.146:45672', '185.41.99.100:41258', '43.230.157.153:52986', '182.23.32.66:30898', '36.37.160.253:31450', '93.170.15.214:56305', '36.67.223.67:43628', '78.26.172.44:52490', '36.83.135.183:3128', '34.74.180.144:3128', '206.189.122.177:3128', '103.194.192.42:55546', '70.102.86.204:8080', '117.254.216.97:23500', '171.100.221.137:8080', '125.166.176.153:8080', '185.146.112.24:8080', '35.237.104.97:3128']
      
      Proxy selected: 190.7.158.58:39871
      
      Proxy selected: 175.139.179.65:54980
      
      Proxy selected: 186.225.45.146:45672
      
      Proxy selected: 185.41.99.100:41258
      
    0 讨论(0)
提交回复
热议问题