When screen-scraping some website, I extract data from tags.
The data I get is not in standard JSON
format. I cannot use j
Not including objects
json.loads()
doesn't accept undefined, you have to change to nulljson.loads()
only accept double quotes
{"foo": 1, "bar": null}
Use this if you are sure that your javascript code only have double quotes on key names.
import json
json_text = """{"foo": 1, "bar": undefined}"""
json_text = re.sub(r'("\s*:\s*)undefined(\s*[,}])', '\\1null\\2', json_text)
py_obj = json.loads(json_text)
ast.literal_eval()
doesn't accept undefined, you have to change to Noneast.literal_eval()
doesn't accept null, you have to change to Noneast.literal_eval()
doesn't accept true, you have to change to Trueast.literal_eval()
doesn't accept false, you have to change to Falseast.literal_eval()
accept single and double quotes
{"foo": 1, "bar": None}
or {'foo': 1, 'bar': None}
import ast
js_obj = """{'foo': 1, 'bar': undefined}"""
js_obj = re.sub(r'([\'\"]\s*:\s*)undefined(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)null(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)NaN(\s*[,}])', '\\1None\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)true(\s*[,}])', '\\1True\\2', js_obj)
js_obj = re.sub(r'([\'\"]\s*:\s*)false(\s*[,}])', '\\1False\\2', js_obj)
py_obj = ast.literal_eval(js_obj)