I am downloading HTML pages that have data defined in them in the following way:
...
I had a similar issue and ended up using selenium with phantomjs. It's a little hacky and I couldn't quite figure out the correct wait until method, but the implicit wait seems to work fine so far for me.
from selenium import webdriver
import json
import re
url = "http..."
driver = webdriver.PhantomJS(service_args=['--load-images=no'])
driver.set_window_size(1120, 550)
driver.get(url)
driver.implicitly_wait(1)
script_text = re.search(r'window\.blog\.data\s*=.*<\/script>', driver.page_source).group(0)
# split text based on first equal sign and remove trailing script tag and semicolon
json_text = script_text.split('=',1)[1].rstrip('').strip().rstrip(';').strip()
# only care about first piece of json
json_text = json_text.split("};")[0] + "}"
data = json.loads(json_text)
driver.quit()
```