lxml xpath doesn't ignore “ ”
问题 I have this HTML: <td class="0"> <b>Bold Text</b> <a href=""></a> </td> <td class="0"> Regular Text <a href=""></a> </td> Which, when formatted with xpath... new_html = tree.xpath('//td[@class="0"]/text() | //td[@class="0"]/b/text()') Prints: ['Bold Text', '', 'Regular Text'] As you can see, the character hasn't been ignored and is actually read as an extra entry in td. How can I get a better output? 回答1: Instead, I'd iterate over all the desired td elements and get the .text_content():