Parse HTML/XML and find locations of elements in original document

雨燕双飞 提交于 2019-12-13 05:19:18

问题


Is there a way to get the original location of an element in a document, ie. the start and end character index, when parsing html/xml in Python?

I've looked through the lxml documentation and couldn't find anything.

eg.

<a>1</a><b>2</b>

...

print tree.find('b').original_position
# result: (9, 16)

回答1:


Google found this, the gist of which is: it's hard for malformed documents because parsing requires synthesizing valid tokens that don't have any corresponding input. It's possible for valid documents, but most parsing libraries don't support it.



来源:https://stackoverflow.com/questions/8258529/parse-html-xml-and-find-locations-of-elements-in-original-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!