How can I read tar.gz file using pandas read_csv with gzip compression option?

前端 未结 2 607
孤街浪徒
孤街浪徒 2020-12-10 00:54

I have a very simple csv, with the following data, compressed inside the tar.gz file. I need to read that in dataframe using pandas.read_csv.

   A  B
0  1          


        
相关标签:
2条回答
  • 2020-12-10 01:31

    You can use the tarfile module to read a particular file from the tar.gz archive (as discussed in this resolved issue). If there is only one file in the archive, then you can do this:

    import tarfile
    import pandas as pd
    with tarfile.open("sample.tar.gz", "r:*") as tar:
        csv_path = tar.getnames()[0]
        df = pd.read_csv(tar.extractfile(csv_path), header=0, sep=" ")
    

    The read mode r:* handles the gz extension (or other kinds of compression) appropriately. If there are multiple files in the zipped tar file, then you could do something like csv_path = list(n for n in tar.getnames() if n.endswith('.csv'))[-1] line to get the last csv file in the archived folder.

    0 讨论(0)
  • 2020-12-10 01:43
    df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
    

    Note: error_bad_lines=False will ignore the offending rows.

    0 讨论(0)
提交回复
热议问题