lxml/Python : get previous-sibling

半城伤御伤魂 提交于 2019-12-07 07:56:36

问题


I have the following html:

<div id = "big">
    <span>header 1</span>
    <ul id = "outer">
        <li id = "inner">aaa</li>
        <li id = "inner">bbb</li>
    </ul>

    <span>header 2</span>
    <ul id = "outer">
        <li id = "inner">ccc</li>
        <li id = "inner">ddd</li>
    </ul>
</div>

I want it to loop it in the order:

header 1
aaa
bbb
header 2
ccc
ddd

I have tried looping through each ul and then printing the header and the li values. However, I don't know how to get the span header that is associated with a ul.

sets = tree.xpath("//div[@id='big']//ul[@id='outer']")

for set in sets:

    # Print header. Not sure how to get it
    header = set.xpath(".//li/preceding-sibling::span")
    print header 

    # Print texts. This works.
    values = set.xpath(".//li//text()")
    for v in values:
        print v 

Just looping all text nodes won't work because I need to know if it is a header or li value.


回答1:


This worked:

header = ingred_set.getprevious().xpath(".//text()")[0]



回答2:


For HTML use BeautifulSoup. It gives you access to previous and next siblings directly:

sibling_soup.b.next_sibling
# <c>text2</c>

sibling_soup.c.previous_sibling
# <b>text1</b>

Also, you can tell BS to use lxml parser in constructor. From practice I can tell that lxml performs better than the default html.parser on ill-formatted input



来源:https://stackoverflow.com/questions/16262532/lxml-python-get-previous-sibling

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!