Assume that I have some HTML code, like this (generated from Markdown or Textile or something):
<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->
How could I generate a table of contents for it using Python?
Use an HTML parser such as lxml or BeautifulSoup to find all header elements.
Here's an example using lxml and xpath.
from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
print node.tag, node.text
来源:https://stackoverflow.com/questions/2210265/how-do-i-generate-a-table-of-contents-for-html-text-in-python