The page I\'m looking at contains :
text 1
text 2
text 3 text 4
What about using jQuery?
Edit:
First you have to add the required .JS files, for that go to www.jQuery.com.
Then all you need to do is call a simple jQuery selector:
alert($("div#1").html());
The following code will give you the HTML in the div element:
sel = selenium('localhost', 4444, browser, my_url)
html = sel.get_eval("this.browserbot.getCurrentWindow().document.getElementById('1').innerHTML")
then you can use BeautifulSoup to parse it and extract what you really want.
I hope it helps
The selected answer does not work in Python 3 at the time of writing. Instead use this:
from selenium import webdriver
wd = webdriver.Firefox()
wd.get(url)
return wd.execute_script('return window.document.getElementById('1').innerHTML')
Use xpath. From selenium.py
:
Without an explicit locator prefix, Selenium uses the following default strategies:
- \**dom**\ , for locators starting with "document."
- \**xpath**\ , for locators starting with "//"
- \**identifier**\ , otherwise
In your case, you could try
selenium.get_text("//div[@id='1']/descendant::*[not(self::h1)]")
You can learn more about xpath here.
P.S. I don't know if there's good HTML documentation available for python-selenium, but I haven't found any; on the other hand, the docstrings of the selenium.py
file seem to constitute comprehensive documentation. So I'd suggest looking up the source to get a better understanding of how it works.