What are the efficient ways to parse / process huge JSON files in Python? [closed]

戏子无情 提交于 2021-02-19 07:35:02

问题


For my project I have to parse two big JSON files, one is 19.7 GB and another 66.3 GB. The structure of the JSON data is too complex. First Level Dictionary and again in 2nd level there might be List or Dictionary. These are all Network Log files, I have to parse those log files and do analysis. Is converting such big JSON file to CSV is advisable?

When I am trying to convert the smaller 19.7 GB JSON file to CSV file, it is having around 2000 columns and 0.5 millions of rows. I am using Pandas to parse those data. I have not touched the bigger file 66.3 GB. Whether I am going in right direction or not? When I 'll convert that bigger file, how many columns and rows will come out, there is no idea.

Kindly suggest any other good options if exists. Or is it advisable to directly read from JSON file and apply OOPs concept over it.

I have already read these articles: article 1 from Stack Overflow and article 2 from Quora


回答1:


you might want to use dask its has similar syntax to pandas only its parallel (essentially its lots of parallel pandas datafames) and lazy (this helps with avoiding ram limitations).

you could use the read_json method and then do your calculations on the dataframe.



来源:https://stackoverflow.com/questions/51278619/what-are-the-efficient-ways-to-parse-process-huge-json-files-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!