selfClosingTags in BeautifulSoup

后端 未结 2 962
Happy的楠姐
Happy的楠姐 2021-01-13 15:57

Using BeautifulSoup to parse my XML

import BeautifulSoup

soup = BeautifulSoup.BeautifulStoneSoup( \"\"\"hello         


        
2条回答
  •  [愿得一人]
    2021-01-13 16:29

    You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)

    Here are two more examples of the behaviour of BeautifulStoneSoup:

    >>> soup = BeautifulSoup.BeautifulStoneSoup(
        """hello"""
        )
    >>> print soup.prettify()
    
     
      hello
     
    
    
    >>> soup = BeautifulSoup.BeautifulStoneSoup(
        """hello""",
        selfClosingTags=['alan'])
    >>> print soup.prettify()
    
    
     hello
    
    >>>
    

    My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like ... (1) assume that the / was a mistake (2) treat alan as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?

提交回复
热议问题