Loading Large Twitter JSON Data (7GB+) into Python

后端 未结 2 1192
长发绾君心
长发绾君心 2021-01-14 22:24

I\'ve set up a public stream via AWS to collect tweets and now want to do some preliminary analysis. All my data was stored on an S3 bucket (in 5mb files).

I downlo

2条回答
  •  醉话见心
    2021-01-14 22:37

    Instead of having the entire file as a JSON object, put one JSON object per line for large datasets!

    To fix the formatting, you should

    1. Remove the [ at the start of the file
    2. Remove the ] at the end of the file
    3. Remove the comma at the end of each line

    Then you can read the file as so:

    with open('one_json_per_line.txt', 'r') as infile:
        for line in infile:
            data_row = json.loads(line)
    

    I would suggest using a different storage if possible. SQLite comes to mind.

提交回复
热议问题