Validate and format JSON files

后端 未结 3 1544
情歌与酒
情歌与酒 2021-02-01 16:04

I have around 2000 JSON files which I\'m trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSO

3条回答
  •  你的背包
    2021-02-01 16:39

    The built-in JSON module can be used as a validator:

    import json
    
    def parse(text):
        try:
            return json.loads(text)
        except ValueError as e:
            print('invalid json: %s' % e)
            return None # or: raise
    

    You can make it work with files by using:

    with open(filename) as f:
        return json.load(f)
    

    instead of json.loads and you can include the filename as well in the error message.

    On Python 3.3.5, for {test: "foo"}, I get:

    invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
    

    and on 2.7.6:

    invalid json: Expecting property name: line 1 column 2 (char 1)
    

    This is because the correct json is {"test": "foo"}.

    When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.

    If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.

    Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.

提交回复
热议问题