I\'ve set up a public stream via AWS to collect tweets and now want to do some preliminary analysis. All my data was stored on an S3 bucket (in 5mb files).
I downlo
Instead of having the entire file as a JSON object, put one JSON object per line for large datasets!
To fix the formatting, you should
[ at the start of the file] at the end of the fileThen you can read the file as so:
with open('one_json_per_line.txt', 'r') as infile:
for line in infile:
data_row = json.loads(line)
I would suggest using a different storage if possible. SQLite comes to mind.