Python Web Scraping (Beautiful Soup, Selenium and PhantomJS): Only scraping part of full page

一笑奈何 提交于 2019-12-04 10:11:04
alecxe

It's not easy to answer since there is no way for us to reproduce the problem.

One problem is that the lxml is not handling this specific HTML particularly well and you may need to try changing the parser:

soup = BeautifulSoup(html2, "html.parser")
soup = BeautifulSoup(html2, "html5lib")

Also, there might not be a need in BeautifulSoup in the first place. You can locate elements with selenium in a lot of different ways. For example, in this case:

for div in driver.find_elements_by_css_selector(".ag-pinned-cols-container'"):
    # do smth with 'div'

It may also be that the data is dynamically loaded when you scroll the page to bottom. In this case, you may need to scroll the page to bottom until you see the desired amount of data or there are no more new data loaded on scroll. Here are the relevant thread with sample solutions:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!