Filtering out HTML elements which have 'display:none' either as a tag attribute or in their CSS

后端 未结 1 1236
耶瑟儿~
耶瑟儿~ 2021-01-14 04:42

Let\'s say you have some html source that\'s been scraped with Selenium, and parsed with BeautifulSoup:

from selenium import webdriver
from bs4 import Beauti         


        
相关标签:
1条回答
  • 2021-01-14 05:44

    I think I remember dealing with a web-site like this - the IP address was internally represented via multiple HTML elements, some of them were hidden via display: none style, some had an appropriate CSS class that made them invisible. Getting the real IP address out of this mess via BeautifulSoup was quite difficult.

    Good news is that selenium actually handles this use case and whenever you get the .text of a WebElement - it would return you a visible text of an element which is exactly what is needed.

    Demo:

    In [1]: from selenium import webdriver
    
    In [2]: driver = webdriver.Firefox()
    
    In [3]: driver.get("http://proxylist.hidemyass.com/")
    
    In [4]: for row in driver.find_elements_by_css_selector("section.proxy-results table#listable tr")[1:]: 
       ...:     cells = row.find_elements_by_tag_name("td")
       ...:     print(cells[1].text.strip())
       ...: 
    101.26.38.162
    120.198.236.10
    213.85.92.10
    ...
    216.161.239.51
    212.200.111.198
    
    0 讨论(0)
提交回复
热议问题