I am trying to read twitter data from json file using python 2.7.12.
Code I used is such:
import json
import sys
reload(sys)
sys.setd
The error occurs when you are trying to read a tweet containing sentence like
"@Mike http:\www.google.com \A8&^)((&() how are&^%()( you ". Which cannot be read as a String instead you are suppose to read it as raw String . but Converting to raw String Still gives error so i better i suggest you to
read a json file something like this:
import codecs
import json
with codecs.open('tweetfile','rU','utf-8') as f:
for line in f:
data=json.loads(line)
print data["tweet"]
keys.append(data["id"])
fulldata.append(data["tweet"])
which will get you the data load from json file .
You can also write it to a csv using Pandas.
import pandas as pd
output = pd.DataFrame( data={ "tweet":fulldata,"id":keys} )
output.to_csv( "tweets.csv", index=False, quoting=1 )
Then read from csv to avoid the encoding and decoding problem
hope this will help you solving you problem.
Midhun