问题
I've written some selector used within python to get some items and it's value. I wish to scrape the items not to style. However, when I run my script, It only gets the items but can't reach the value of those items which are separated by "br" tag. How can I grab them? I do not with to use xpath in this very case to serve the purpose. Thanks in advance.
Here are the elements:
html = '''
<div class="elems"><br>
<ul>
<li><b>Item Name:</b><br>
titan
</li>
<li><b>Item No:</b><br>
23003400
</li>
<li><b>Item Sl:</b><br>
2760400
</li>
</ul>
</div>
'''
Here is my script with css selectors in it:
from lxml import html as e
root = e.fromstring(html)
for items in root.cssselect(".elems li"):
item = items.cssselect("b")[0].text_content()
print(item)
Upon execution, the result I'm having:
Item Name:
Item No:
Item Sl:
The result I'm after:
Item Name: titan
Item No: 23003400
Item Sl: 2760400
回答1:
Generally I use .itertext
method to extract text:
from lxml.html import fromstring
def extract_text(el, sep=' '):
return sep.join(s.strip() for s in li.itertext() if s.strip())
tree = fromstring(html)
for li in tree.cssselect('.elems li'):
print(extract_text(li))
回答2:
The easiest solution ever. Values are within "li" tag not "b".
from lxml import html as e
root = e.fromstring(html)
for items in root.cssselect(".elems"):
item = [item.text_content() for item in items.cssselect("li")]
print(''.join(item))
来源:https://stackoverflow.com/questions/46028354/unable-to-get-the-full-content-using-selector