BeautifulSoup innerhtml?

前端 未结 6 1512
猫巷女王i
猫巷女王i 2020-11-27 13:28

Let\'s say I have a page with a div. I can easily get that div with soup.find().

Now that I have the result, I\'d like to print the WHOLE <

6条回答
  •  暖寄归人
    2020-11-27 13:59

    TL;DR

    With BeautifulSoup 4 use element.encode_contents() if you want a UTF-8 encoded bytestring or use element.decode_contents() if you want a Python Unicode string. For example the DOM's innerHTML method might look something like this:

    def innerHTML(element):
        """Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
        return element.encode_contents()
    

    These functions aren't currently in the online documentation so I'll quote the current function definitions and the doc string from the code.

    encode_contents - since 4.0.4

    def encode_contents(
        self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
        formatter="minimal"):
        """Renders the contents of this tag as a bytestring.
    
        :param indent_level: Each line of the rendering will be
           indented this many spaces.
    
        :param encoding: The bytestring will be in this encoding.
    
        :param formatter: The output formatter responsible for converting
           entities to Unicode characters.
        """
    

    See also the documentation on formatters; you'll most likely either use formatter="minimal" (the default) or formatter="html" (for html entities) unless you want to manually process the text in some way.

    encode_contents returns an encoded bytestring. If you want a Python Unicode string then use decode_contents instead.


    decode_contents - since 4.0.1

    decode_contents does the same thing as encode_contents but returns a Python Unicode string instead of an encoded bytestring.

    def decode_contents(self, indent_level=None,
                       eventual_encoding=DEFAULT_OUTPUT_ENCODING,
                       formatter="minimal"):
        """Renders the contents of this tag as a Unicode string.
    
        :param indent_level: Each line of the rendering will be
           indented this many spaces.
    
        :param eventual_encoding: The tag is destined to be
           encoded into this encoding. This method is _not_
           responsible for performing that encoding. This information
           is passed in so that it can be substituted in if the
           document contains a  tag that mentions the document's
           encoding.
    
        :param formatter: The output formatter responsible for converting
           entities to Unicode characters.
        """
    

    BeautifulSoup 3

    BeautifulSoup 3 doesn't have the above functions, instead it has renderContents

    def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
                       prettyPrint=False, indentLevel=0):
        """Renders the contents of this tag as a string in the given
        encoding. If encoding is None, returns a Unicode string.."""
    

    This function was added back to BeautifulSoup 4 (in 4.0.4) for compatibility with BS3.

提交回复
热议问题