How can I read tar.gz file using pandas read_csv with gzip compression option?

匿名 (未验证) 提交于 2019-12-03 02:33:02

问题:

I have a very simple csv, with the following data, compressed inside the tar.gz file. I need to read that in dataframe using pandas.read_csv.

   A  B 0  1  4 1  2  5 2  3  6  import pandas as pd pd.read_csv("sample.tar.gz",compression='gzip') 

However, I am getting error:

CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2 

Following are the set of read_csv commands and the different errors I get with them:

pd.read_csv("sample.tar.gz",compression='gzip',  engine='python') Error: line contains NULL byte  pd.read_csv("sample.tar.gz",compression='gzip', header=0) CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2  pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ") CParserError: Error tokenizing data. C error: Expected 2 fields in line 94, saw 14      pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ", engine='python') Error: line contains NULL byte 

What's going wrong here? How can I fix this?

回答1:

df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False) 

Note: error_bad_lines=False will ignore the offending rows.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!