How to read line-delimited JSON from large file (line by line)

邮差的信 提交于 2019-12-03 22:21:43
njzk2

Just read each line and construct a json object at this time:

with open(file_path) as f:
    for line in f:
        j_content = json.loads(line)

This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.

There is also this answer.:

https://stackoverflow.com/a/7795029/671543

contents = open(file_path, "r").read() 
data = [json.loads(str(item)) for item in contents.strip().split('\n')]

This will work for the specific file format that you gave. If your format changes, then you'll need to change the way the lines are parsed.

{
    "key11": 11,
    "key12": 12
}
{
    "key21": 21,
    "key22": 22
}

Just read line-by-line, and build the JSON blocks as you go:

with open(args.infile, 'r') as infile:

    # Variable for building our JSON block
    json_block = []

    for line in infile:

        # Add the line to our JSON block
        json_block.append(line)

        # Check whether we closed our JSON block
        if line.startswith('}'):

            # Do something with the JSON dictionary
            json_dict = json.loads(''.join(json_block))
            print(json_dict)

            # Start a new block
            json_block = []

If you are interested in parsing one very large JSON file without saving everything to memory, you should look at using the object_hook or object_pairs_hook callback methods in the json.load API.

Just read it line by line and parse e through a stream while ur hacking trick (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list) isn't memory-friendly if the file is too more than 1GB as the whole content will land on the RAM.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!