BeautifulSoup - extract json from JS

后端 未结 1 1277
太阳男子
太阳男子 2020-12-16 06:21

I\'m toying around with BeautilfulSoup and I\'m looking for a way to get a specific json string within a JS element.

Here\'s the JS:



        
相关标签:
1条回答
  • 2020-12-16 07:12

    The idea is use a regular expression pattern with a capturing group. Then, use this regular expression to locate the script element by text and then to extract the substring from a script itself. Then, you may use json.loads() to load the JSON string into a Python object:

    import json
    import re
    
    from bs4 import BeautifulSoup
    
    data = """
    your HTML here"""
    
    soup = BeautifulSoup(data, "html.parser")
    
    pattern = re.compile(r"window.Rent.data\s+=\s+(\{.*?\});\n")
    script = soup.find("script", text=pattern)
    
    data = pattern.search(script.text).group(1)
    data = json.loads(data)
    print(data)
    

    There is also an another way - a JavaScript parser - I've experimented with slimit on StackOverflow a couple of times, check it out.

    0 讨论(0)
提交回复
热议问题