conditional row read of csv in pandas

前端 未结 4 878
无人及你
无人及你 2020-12-06 12:43

I have large CSVs where I\'m only interested in a subset of the rows. In particular, I\'d like to read in all the rows which occur before a particular condition is met.

4条回答
  •  鱼传尺愫
    2020-12-06 12:54

    You can use the built-in csv module to calculate the appropriate row number. Then use pd.read_csv with the nrows argument:

    from io import StringIO
    import pandas as pd
    import csv, copy
    
    mycsv = StringIO(""" A      B     C
    34   3.20   'b'
    24   9.21   'b'
    34   3.32   'c'
    24   24.3   'c'
    35   1.12   'a'""")
    
    mycsv2 = copy.copy(mycsv)  # copying StringIO object [for demonstration purposes]
    
    with mycsv as fin:
        reader = csv.reader(fin, delimiter=' ', skipinitialspace=True)
        header = next(reader)
        counter = next(idx for idx, row in enumerate(reader) if float(row[1]) > 10)
    
    df = pd.read_csv(mycsv2, delim_whitespace=True, nrows=counter+1)
    
    print(df)
    
        A      B    C
    0  34   3.20  'b'
    1  24   9.21  'b'
    2  34   3.32  'c'
    3  24  24.30  'c'
    

提交回复
热议问题