UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte

前端 未结 3 2131
轮回少年
轮回少年 2020-12-11 01:19

I am trying to read twitter data from json file using python 2.7.12.

Code I used is such:

    import json
    import sys
    reload(sys)
    sys.setd         


        
3条回答
  •  Happy的楠姐
    2020-12-11 02:08

    The error occurs when you are trying to read a tweet containing sentence like

    "@Mike http:\www.google.com \A8&^)((&() how are&^%()( you ". Which cannot be read as a String instead you are suppose to read it as raw String . but Converting to raw String Still gives error so i better i suggest you to

    read a json file something like this:

    import codecs
    import json
        with codecs.open('tweetfile','rU','utf-8') as f:
                 for line in f:
                    data=json.loads(line)
                    print data["tweet"]
    keys.append(data["id"])
                fulldata.append(data["tweet"])
    

    which will get you the data load from json file .

    You can also write it to a csv using Pandas.

    import pandas as pd
    output = pd.DataFrame( data={ "tweet":fulldata,"id":keys} )
    output.to_csv( "tweets.csv", index=False, quoting=1 )
    

    Then read from csv to avoid the encoding and decoding problem

    hope this will help you solving you problem.

    Midhun

提交回复
热议问题