Let\'s say you have some html source that\'s been scraped with Selenium, and parsed with BeautifulSoup:
from selenium import webdriver
from bs4 import Beauti
I think I remember dealing with a web-site like this - the IP address was internally represented via multiple HTML elements, some of them were hidden via display: none
style, some had an appropriate CSS class that made them invisible. Getting the real IP address out of this mess via BeautifulSoup
was quite difficult.
Good news is that selenium actually handles this use case and whenever you get the .text
of a WebElement
- it would return you a visible text of an element which is exactly what is needed.
Demo:
In [1]: from selenium import webdriver
In [2]: driver = webdriver.Firefox()
In [3]: driver.get("http://proxylist.hidemyass.com/")
In [4]: for row in driver.find_elements_by_css_selector("section.proxy-results table#listable tr")[1:]:
...: cells = row.find_elements_by_tag_name("td")
...: print(cells[1].text.strip())
...:
101.26.38.162
120.198.236.10
213.85.92.10
...
216.161.239.51
212.200.111.198