Filtering out HTML elements which have 'display:none' either as a tag attribute or in their CSS

拜拜、爱过 提交于 2020-01-11 06:27:04

问题


Let's say you have some html source that's been scraped with Selenium, and parsed with BeautifulSoup:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.get(url)
soup = BeautifulSoup(driver.page_source)

Is there a way to remove, from the html code or the soup object, all elements which either have:

1.) the attribute style=display:none within the html tag source (i.e. <div style = 'display:none'>...</div>)

or

2.) have the display:none property within the page's CSS


回答1:


I think I remember dealing with a web-site like this - the IP address was internally represented via multiple HTML elements, some of them were hidden via display: none style, some had an appropriate CSS class that made them invisible. Getting the real IP address out of this mess via BeautifulSoup was quite difficult.

Good news is that selenium actually handles this use case and whenever you get the .text of a WebElement - it would return you a visible text of an element which is exactly what is needed.

Demo:

In [1]: from selenium import webdriver

In [2]: driver = webdriver.Firefox()

In [3]: driver.get("http://proxylist.hidemyass.com/")

In [4]: for row in driver.find_elements_by_css_selector("section.proxy-results table#listable tr")[1:]: 
   ...:     cells = row.find_elements_by_tag_name("td")
   ...:     print(cells[1].text.strip())
   ...: 
101.26.38.162
120.198.236.10
213.85.92.10
...
216.161.239.51
212.200.111.198


来源:https://stackoverflow.com/questions/33597616/filtering-out-html-elements-which-have-displaynone-either-as-a-tag-attribute

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!