How to read UTF-8 files with Pandas?

后端 未结 3 654
傲寒
傲寒 2020-12-14 08:32

I have a UTF-8 file with twitter data and I am trying to read it into a Python data frame but I can only get an \'object\' type instead of unicode strings:

#         


        
相关标签:
3条回答
  • 2020-12-14 08:58

    Pandas stores strings in objects. In python 3, all string are in unicode by default. So if you use python 3, your data is already in unicode (don't be mislead by type object).

    If you have python 2, then use df = pd.read_csv('your_file', encoding = 'utf8'). Then try for example pd.lib.infer_dtype(df.iloc[0,0]) (I guess the first col consists of strings.)

    0 讨论(0)
  • 2020-12-14 09:00

    Use the encoding keyword with the appropriate parameter:

    df = pd.read_csv('1459966468_324.csv', encoding='utf8')
    
    0 讨论(0)
  • 2020-12-14 09:16

    As the other poster mentioned, you might try:

    df = pd.read_csv('1459966468_324.csv', encoding='utf8')
    

    However this could still leave you looking at 'object' when you print the dtypes. To confirm they are utf8, try this line after reading the CSV:

    df.apply(lambda x: pd.lib.infer_dtype(x.values))
    

    Example output:

    args            unicode
    date         datetime64
    host            unicode
    kwargs          unicode
    operation       unicode
    
    0 讨论(0)
提交回复
热议问题