Read CSV into a dataFrame with varying row lengths using Pandas

前端 未结 6 1757
孤城傲影
孤城傲影 2020-12-03 22:32

So I have a CSV that looks a bit like this:

1 | 01-01-2019 | 724
2 | 01-01-2019 | 233 | 436
3 | 01-01-2019 | 345
4 | 01-01-2019 | 803 | 933 | 943 | 923 | 954         


        
6条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-03 23:28

    If you know that the data contains N columns, you can tell Pandas in advance how many columns to expect via the names parameter:

    import pandas as pd
    df = pd.read_csv('data', delimiter='|', names=list(range(7)))
    print(df)
    

    yields

       0             1    2      3      4      5      6
    0  1   01-01-2019   724    NaN    NaN    NaN    NaN
    1  2   01-01-2019   233  436.0    NaN    NaN    NaN
    2  3   01-01-2019   345    NaN    NaN    NaN    NaN
    3  4   01-01-2019   803  933.0  943.0  923.0  954.0
    4  5   01-01-2019   454    NaN    NaN    NaN    NaN
    

    If you have an the upper limit, N, on the number of columns, then you can have Pandas read N columns and then use dropna to drop completely empty columns:

    import pandas as pd
    df = pd.read_csv('data', delimiter='|', names=list(range(20))).dropna(axis='columns', how='all')
    print(df)
    

    yields

       0             1    2      3      4      5      6
    0  1   01-01-2019   724    NaN    NaN    NaN    NaN
    1  2   01-01-2019   233  436.0    NaN    NaN    NaN
    2  3   01-01-2019   345    NaN    NaN    NaN    NaN
    3  4   01-01-2019   803  933.0  943.0  923.0  954.0
    4  5   01-01-2019   454    NaN    NaN    NaN    NaN
    

    Note that this could drop columns from the middle of the data set (not just columns from the right-hand side) if they are completely empty.

提交回复
热议问题