Extracting text from script tag using BeautifulSoup in Python

后端 未结 2 1842
自闭症患者
自闭症患者 2020-11-27 22:17

Could you please help me with this lil thing. I am looking to extract email, phone and name value from the below code in SCRIPT tag(not in Body) using Beautiful soup(Python)

2条回答
  •  隐瞒了意图╮
    2020-11-27 22:41

    Alternatively to the regex-based approach, you can parse the javascript code using slimit module, that builds an Abstract Syntax Tree and gives you a way of getting all assignments and putting them into the dictionary:

    from bs4 import BeautifulSoup
    from slimit import ast
    from slimit.parser import Parser
    from slimit.visitors import nodevisitor
    
    
    data = """
    
        
            My Sample Page
            
        
        
            

    What a wonderful world

    """ # get the script tag contents from the html soup = BeautifulSoup(data) script = soup.find('script') # parse js parser = Parser() tree = parser.parse(script.text) fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '') for node in nodevisitor.visit(tree) if isinstance(node, ast.Assign)} print fields

    Prints:

    {u'name': u"'XYZ'", u'url': u"'http://www.example.com'", u'type': u'"POST"', u'phone': u"'9999999999'", u'data': '', u'email': u"'abc@g.com'"}
    

    Among other fields, there are email, name and phone that you are interested in.

    Hope that helps.

提交回复
热议问题