问题
The purpose of this code is to scrape a data table form a some links then turn it into a pandas data frame.
The problem is that this code only scrapes the first 7 rows only which are in the first page of the table and I want to capture the whole table. so when i tried to loop over table pages, i got an error.
Here is the code:
from selenium import webdriver
urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
for url in urls:
driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
driver.get(url)
for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
driver.execute_script("arguments[0].click();", item)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
here is the error:
Traceback (most recent call last):
File "D:/Projects/Tutorial/ff.py", line 8, in for item in driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a'):
TypeError: 'WebElement' object is not iterable
回答1:
Check out the below script to get the whole table from that webpage. I've used harcoded delay within my script which is not a good practice. However, you can always define Explicit Wait
to make the code more robust:
import time
from selenium import webdriver
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome()
driver.get(url)
item = driver.find_element_by_xpath('//*[contains(@id,"showMoreHistory")]/a')
driver.execute_script("arguments[0].click();", item)
time.sleep(2)
for table in driver.find_elements_by_xpath('//*[contains(@id,"eventHistoryTable")]//tr'):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
driver.quit()
To get all the data exhausting the show more
button along with defining Explicit Wait
you can try the below script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155'
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver,10)
while True:
try:
item = wait.until(EC.visibility_of_element_located((By.XPATH,'//*[contains(@id,"showMoreHistory")]/a')))
driver.execute_script("arguments[0].click();", item)
except Exception:break
for table in wait.until(EC.visibility_of_all_elements_located((By.XPATH,'//*[contains(@id,"eventHistoryTable")]//tr'))):
data = [item.text for item in table.find_elements_by_xpath(".//*[self::td or self::th]")]
print(data)
driver.quit()
回答2:
As per your question and the url https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155
to scrape the whole table you can use the following solution:
Code Block:
# -*- coding: UTF-8 -*- from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException table_rows = [] options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_argument('disable-infobars') driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get("https://www.investing.com/economic-calendar/investing.com-eur-usd-index-1155") show_more_button = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr>th.left.symbol"))) driver.execute_script("arguments[0].scrollIntoView(true);",show_more_button); myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")))) while True: try: WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div#showMoreHistory1155>a"))).click() WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']")) > myLength) table_rows = driver.find_elements_by_css_selector("table.genTbl.openTbl.ecHistoryTbl#eventHistoryTable1155 tr[event_attr_id='1155']") myLength = len(table_rows) except TimeoutException: break for row in table_rows: print(row.text) driver.quit()
Console Output:
Sep 24, 2018 01:30 Sep 17, 2018 01:30 53.1% 55.3% Sep 10, 2018 01:30 55.3% 49.0% Sep 03, 2018 01:30 49.0% 43.3% Aug 27, 2018 01:30 43.3% 49.7% Aug 20, 2018 01:30 49.7% 52.5% Aug 13, 2018 01:30 52.5% 59.9% Aug 06, 2018 01:30 59.9% 62.6% Jul 30, 2018 01:30 62.6% 52.8% Jul 23, 2018 01:30 52.8% 52.7% Jul 16, 2018 01:30 52.7% 46.2% Jul 10, 2018 01:30 46.2% 55.3% Jul 02, 2018 01:30 55.3% 53.1% Jun 25, 2018 01:30 53.1% 66.2% Jun 18, 2018 01:30 66.2% 65.2% Jun 11, 2018 01:30 65.2% 61.2% Jun 04, 2018 01:30 61.2% 63.9% May 28, 2018 01:30 63.9% 67.0% May 21, 2018 01:30 67.0% 63.2% May 14, 2018 01:30 63.2% 61.3% May 07, 2018 01:30 61.3% 57.6% Apr 30, 2018 01:30 57.6% 64.8% Apr 23, 2018 01:30 64.8% 65.2% Apr 16, 2018 01:30 65.2% 60.4% Apr 09, 2018 01:30 60.4% 63.3% Apr 02, 2018 01:30 63.3% 62.1% Mar 26, 2018 01:30 62.1% 65.7% Mar 19, 2018 02:30 65.7% 56.0% Mar 12, 2018 02:30 56.0% 62.3% Mar 05, 2018 02:30 62.3% 59.1% Feb 26, 2018 02:30 59.1% 52.8% Feb 19, 2018 02:30 52.8% 55.8% Feb 12, 2018 02:30 55.8% 51.7% Feb 05, 2018 02:30 51.7% 56.8% Jan 29, 2018 02:30 56.8% 52.2% Jan 22, 2018 02:30 52.2% 56.1% Jan 15, 2018 02:30 56.1% 60.2% Jan 08, 2018 02:30 60.2% 54.6% Jan 01, 2018 02:30 54.6% 48.4% Dec 25, 2017 02:30 48.4% 66.4% Dec 18, 2017 02:30 66.4% 58.9% Dec 11, 2017 02:30 58.9% 53.8% Dec 04, 2017 02:30 53.8% 55.9% Nov 28, 2017 02:30 55.9% 53.7% Nov 20, 2017 02:30 53.7% 58.6% Nov 14, 2017 02:30 58.6% 52.8% Nov 06, 2017 02:30 52.8% 57.6% Oct 30, 2017 01:30 57.6% 54.7% Oct 23, 2017 01:30 54.7% 58.9% Oct 16, 2017 01:30 58.9% 57.3% Oct 09, 2017 01:30 57.3% 64.0% Oct 02, 2017 01:30 64.0% 47.5% Sep 25, 2017 01:30 47.5% 52.2% Sep 18, 2017 01:30 52.2% 55.5% Sep 11, 2017 01:30 55.5% 54.3% Sep 04, 2017 01:30 54.3% 54.2% Aug 28, 2017 01:30 54.2% 51.4% Aug 21, 2017 01:30 51.4% 57.4% Aug 14, 2017 01:30 57.4% 51.2% Aug 07, 2017 01:30 51.2% 51.3% Jul 31, 2017 01:30 51.3% 52.8% Jul 24, 2017 01:30 52.8% 53.3% Jul 17, 2017 01:30 53.3% 54.1% Jul 10, 2017 01:30 54.1% 51.9% Jul 03, 2017 01:30 51.9% 40.6% Jun 26, 2017 01:30 40.6% 52.6% Jun 19, 2017 01:30 52.6% 51.0% Jun 12, 2017 01:30 51.0% 52.1% Jun 05, 2017 01:30 52.1% 59.1% May 29, 2017 01:30 59.1% 46.9% May 22, 2017 01:30 46.9% 53.0% May 15, 2017 01:30 53.0% 44.9% May 08, 2017 01:30 44.9% 37.0% May 01, 2017 01:30 37.0% 43.0% Apr 24, 2017 01:30 43.0% 52.4% Apr 10, 2017 01:30 52.4% 55.1% Apr 03, 2017 01:30 55.1% 43.5% Mar 27, 2017 02:30 43.5% 36.0% Mar 20, 2017 02:30 36.0% 32.3% Mar 13, 2017 02:30 32.3% 42.8% Mar 06, 2017 02:30 42.8% 39.1% Feb 27, 2017 02:30 39.1% 41.7% Feb 20, 2017 02:30 41.7% 43.2% Feb 13, 2017 02:30 43.2% 36.6% Feb 06, 2017 02:30 36.6% 39.7% Jan 30, 2017 02:30 39.7% 33.5% Jan 23, 2017 02:30 33.5% 36.8% Jan 16, 2017 03:30 36.8% 37.0% Jan 09, 2017 02:30 37.0% 41.6% Jan 02, 2017 02:30 41.6% 35.8% Dec 26, 2016 02:30 35.8% 42.3% Dec 19, 2016 02:30 42.3% 39.7% Dec 12, 2016 04:15 39.7% 33.8% Dec 05, 2016 02:30 33.8% 37.1% Nov 29, 2016 02:30 37.1% 41.9% Nov 21, 2016 02:30 41.9% 39.1% Nov 15, 2016 02:00 39.1% 20.5% Nov 07, 2016 02:30 20.5% 27.4% Oct 31, 2016 02:30 27.4% 33.4% Oct 25, 2016 02:30 33.4% 30.8% Oct 18, 2016 02:30 30.8% 26.6% Oct 10, 2016 02:30 26.6% 28.6% Oct 05, 2016 02:00 28.6% 26.2% Sep 26, 2016 02:30 26.2% 34.8% Sep 19, 2016 02:30 34.8% 21.2% Sep 13, 2016 02:30 21.2% 27.0% Sep 05, 2016 02:30 27.0% 32.7% Aug 29, 2016 02:30 32.7% 23.9% Aug 22, 2016 02:30 23.9% 28.8% Aug 15, 2016 02:30 28.8% 30.8% Aug 08, 2016 02:30 30.8% 20.3% Aug 01, 2016 02:30 20.3% 30.2% Jul 25, 2016 02:30 30.2% 29.5% Jul 18, 2016 02:30 29.5% 26.2% Jul 11, 2016 02:30 26.2% 27.5% Jul 04, 2016 02:30 27.5% 26.8% Jun 27, 2016 02:30 26.8% 35.1% Jun 20, 2016 02:30 35.1% 22.8% Jun 13, 2016 02:30 22.8% 32.5% Jun 06, 2016 02:30 32.5% 35.6% May 30, 2016 02:30 35.6% 39.5% May 23, 2016 02:30 39.5% 37.8% May 16, 2016 03:30 37.8% 39.5% May 09, 2016 02:30 39.5% 30.3% May 02, 2016 02:30 30.3% 32.9% Apr 25, 2016 02:30 32.9% 29.6% Apr 18, 2016 06:00 29.6% 30.5% Apr 11, 2016 02:30 30.5% 22.7% Apr 04, 2016 03:30 22.7% 32.1% Mar 28, 2016 03:30 32.1% 23.2% Mar 21, 2016 03:30 23.2% 26.7% Mar 14, 2016 03:30 26.7% 22.6% Mar 07, 2016 03:30 22.6% 33.7% Feb 29, 2016 03:30 33.7% 34.8% Feb 22, 2016 03:30 34.8% 33.3% Feb 15, 2016 03:30 33.3% 33.3% Feb 08, 2016 03:30 33.3% 34.3% Feb 01, 2016 03:30 34.3% 33.2% Jan 25, 2016 03:30 33.2% 27.0% Jan 18, 2016 03:30 27.0% 27.2% Jan 11, 2016 03:30 27.2% 30.0% Jan 05, 2016 03:30 30.0% 24.0% Dec 29, 2015 03:30 24.0% 33.3% Dec 21, 2015 03:30 33.3% 31.2% Dec 14, 2015 04:30 31.2% 27.1% Dec 07, 2015 03:00 27.1% 29.8% Dec 01, 2015 03:00 29.8% 27.5% Nov 23, 2015 03:00 27.5% 33.1% Nov 17, 2015 04:00 33.1% 26.8% Nov 09, 2015 02:30 26.8% 24.3% Nov 02, 2015 01:30 24.3% 36.4% Oct 26, 2015 01:30 36.4% 28.6% Oct 19, 2015 01:30 28.6% 25.5% Oct 11, 2015 04:30 25.5% 29.6% Oct 06, 2015 01:00 29.6% 28.5% Sep 28, 2015 01:30 28.5% 29.1% Sep 21, 2015 01:30 29.1% 21.2% Sep 14, 2015 01:30 21.2% 29.8% Sep 07, 2015 01:30 29.8% 36.3% Aug 31, 2015 01:30 36.3% 35.6% Aug 24, 2015 01:30 35.6% 26.4% Aug 17, 2015 01:30 26.4% 24.8% Aug 10, 2015 01:30 24.8% 29.7% Aug 03, 2015 01:30 29.7% 24.8% Jul 27, 2015 01:30 24.8% 30.7% Jul 20, 2015 01:30 30.7% 27.9% Jul 13, 2015 01:30 27.9% 27.4% Jul 07, 2015 01:30 27.4% 26.8% Jun 29, 2015 01:30 26.8% 33.1% Jun 22, 2015 01:30 33.1% 33.6% Jun 15, 2015 03:30 33.6% 28.9% Jun 08, 2015 01:30 28.9% 23.0% Jun 01, 2015 01:30 23.0% 34.0% May 25, 2015 04:00 34.0% 28.9% May 18, 2015 01:30 28.9% 28.8% May 11, 2015 01:30 28.8% 28.3% May 04, 2015 02:00 28.3% 23.7% Apr 27, 2015 01:30 23.7% 27.2% Apr 20, 2015 01:30 27.2% 33.7% Apr 13, 2015 02:00 33.7% 23.2% Apr 06, 2015 02:00 23.2% 19.8% Mar 30, 2015 02:30 19.8% 24.1% Mar 23, 2015 02:30 24.1% 27.2% Mar 16, 2015 03:00 27.2% 35.6% Mar 09, 2015 02:30 35.6% 34.4% Mar 02, 2015 02:30 34.4% 30.2% Feb 23, 2015 02:30 30.2% 26.6% Feb 16, 2015 03:30 26.6% 23.8% Feb 09, 2015 02:30 23.8% 26.4% Feb 02, 2015 02:30 26.4% 23.9% Jan 26, 2015 02:30 23.9% 28.9% Jan 19, 2015 02:30 28.9% 35.5% Jan 12, 2015 02:30 35.5% 38.1% Jan 06, 2015 03:30 38.1% 40.6% Jan 01, 2015 02:30 40.6% 45.2% Dec 22, 2014 02:00 45.2% 39.8% Dec 15, 2014 02:00 39.8% 41.7% Dec 07, 2014 21:00 41.7% 33.8% Dec 02, 2014 03:00 33.8% 38.6% Nov 24, 2014 01:30 38.6% 39.2% Nov 17, 2014 01:00 39.2% 33.1% Nov 10, 2014 01:00 33.1% 35.4% Nov 04, 2014 03:00 35.4% 37.3% Oct 27, 2014 02:00 37.3% 33.7% Oct 19, 2014 22:00 33.7% 36.2% Oct 13, 2014 01:00 36.2% 44.5% Oct 06, 2014 01:00 44.5% 41.3% Sep 29, 2014 01:00 41.3% 50.3% Sep 21, 2014 22:35 50.3% 39.5% Sep 15, 2014 00:45 39.5% 39.9% Sep 08, 2014 01:00 39.9% 42.8% Sep 01, 2014 02:35 42.8% 41.9% Aug 25, 2014 01:00 41.9% 38.9% Aug 18, 2014 01:00 38.9% 34.0% Aug 11, 2014 01:00 34.0% 38.2% Aug 04, 2014 01:00 38.2% 38.4% Jul 28, 2014 01:00 38.4% 42.3% Jul 21, 2014 01:00 42.3% 37.2% Jul 14, 2014 01:00 37.2% 39.6% Jul 07, 2014 01:00 39.6% 39.8% Jun 30, 2014 01:00 39.8% 36.1% Jun 23, 2014 00:30 36.1% 37.6% Jun 16, 2014 00:30 37.6% 36.5% Jun 09, 2014 00:30 36.5% 44.1% Jun 01, 2014 22:00 44.1% 49.4% May 26, 2014 00:30 49.4% 41.0% May 19, 2014 00:00 41.0% 55.0% May 12, 2014 00:00 55.0% 41.1% May 04, 2014 06:00 41.1% 43.5% Apr 27, 2014 06:00 43.5% 40.3% Apr 06, 2014 06:00 40.3%
来源:https://stackoverflow.com/questions/52448137/python-selenium-scrape-the-whole-table