I\'m toying around with BeautilfulSoup and I\'m looking for a way to get a specific json string within a JS element.
Here\'s the JS:
The idea is use a regular expression pattern with a capturing group. Then, use this regular expression to locate the script
element by text and then to extract the substring from a script itself. Then, you may use json.loads()
to load the JSON string into a Python object:
import json
import re
from bs4 import BeautifulSoup
data = """
your HTML here"""
soup = BeautifulSoup(data, "html.parser")
pattern = re.compile(r"window.Rent.data\s+=\s+(\{.*?\});\n")
script = soup.find("script", text=pattern)
data = pattern.search(script.text).group(1)
data = json.loads(data)
print(data)
There is also an another way - a JavaScript parser - I've experimented with slimit on StackOverflow a couple of times, check it out.