How to use Beautiful Soup to extract string in [removed] tag?

后端 未结 4 2015
梦如初夏
梦如初夏 2020-11-28 14:45

In a given .html page, I have a script tag like so:

     

        
4条回答
  •  我在风中等你
    2020-11-28 14:50

    I ran into a similar problem and the issue seems to be that calling script_tag.text returns an empty string. Instead, you have to call script_tag.string. Maybe this changed in some version of BeautifulSoup?

    Anyway, @alecxe's answer didn't work for me, so I modified their solution:

    import re
    
    from bs4 import BeautifulSoup
    
    data = """
    
        
    
    """
    soup = BeautifulSoup(data, "html.parser")
    
    script_tag = soup.find("script")
    if script_tag:
      # contains all of the script tag, e.g. "jQuery(window)..."
      script_tag_contents = script_tag.string
    
      # from there you can search the string using a regex, etc.
      email = re.search(r'\.+val\("(.+)"\);', script_tag_contents).group(1)
      print(email)
    

    This prints name@email.com.

提交回复
热议问题