Get all text from an XML document?

前端 未结 5 1218
闹比i
闹比i 2020-12-11 07:17

How can I get all the text content of an XML document, as a single string - like this Ruby/hpricot example but using Python.

I\'d like to replace XML tags with a sin

5条回答
  •  轮回少年
    2020-12-11 07:27

    I really like BeautifulSoup, and would rather not use regex on HTML if we can avoid it.

    Adapted from: [this StackOverflow Answer], [BeautifulSoup documentation]

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(txt)    # txt is simply the a string with your XML file
    pageText = soup.findAll(text=True)
    print ' '.join(pageText)
    

    Though of course, you can (and should) use BeautifulSoup to navigate the page for what you are looking for.

提交回复
热议问题