Web scraping with Selenium not capturing full text [closed]

北城以北 提交于 2020-12-13 03:02:17

问题


I'm trying to mine quite a bit of text from a list of links using Selenium/Python.

In this example, I scrape only one of the pages and that successfully grabs the full text:

    page = 'https://xxxxxx.net/xxxxx/September%202020/2020-09-24'

driver = webdriver.Firefox()

driver.get(page)

elements = driver.find_element_by_class_name('text').text

elements

Then, when I try to loop through the whole list of links (all the by day links on this page: https://overrustlelogs.net/Destinygg%20chatlog/September%202020) (using the same method that worked for grabbing the text from a single page), it is not grabbing the full text:

for i in tqdm(chat_links):
driver.get(i)
#driver.implicitly_wait(200)
elements = driver.find_element_by_class_name('text').text
#elements = driver.find_element_by_xpath('/html/body/main/div[1]/div[1]').text
#elements = elements.text
temp={'elements':elements}
chat_text.append(temp)

driver.close()

chat_text

My thought is that maybe it doesn't have the chance to load the whole thing, but it works on the single page. Also, the driver.get method seems meant to load the whole given page.

Any ideas? Thanks, much appreciated.


回答1:


The page is lazy loading you need scroll the pages and add data in the list.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("https://overrustlelogs.net/Destinygg%20chatlog/September%202020/2020-09-30")
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".text>span")))
height=driver.execute_script("return document.body.scrollHeight")
data=[]
while True:
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    time.sleep(1)
    for item in driver.find_elements_by_css_selector(".text>span"):
        if item.text in data:
            continue
        else:
            data.append(item.text)

    lastheight=driver.execute_script("return document.body.scrollHeight")
    if height==lastheight:
        break
    height=lastheight

print(data)


来源:https://stackoverflow.com/questions/64432539/web-scraping-with-selenium-not-capturing-full-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!