Select specific CSV columns (Filtering) - Python/pandas

前端 未结 3 597
梦如初夏
梦如初夏 2020-12-31 07:02

I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.

Let\'s suppose that we have a CSV file.

相关标签:
3条回答
  • 2020-12-31 07:14

    As Wai Yip Tung said, you can filter your dataframe while reading by specifying the name of the columns, for example:

    import pandas as pd
    data = pd.read_csv("ThisFile.csv")[['value','d']]
    

    This solved my problem.

    0 讨论(0)
  • 2020-12-31 07:20

    This selects the second and fourth columns (since Python uses 0-based indexing):

    In [272]: df.iloc[:,(1,3)]
    Out[272]: 
       value  f
    0    975  5
    1    976  4
    2    977  1
    3    978  0
    4    979  0
    
    [5 rows x 2 columns]
    

    df.ix can select by location or label. df.iloc always selects by location. When indexing by location use df.iloc to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.


    Another possibility is to use the usecols parameter:

    data = pandas.read_csv("ThisFile.csv", usecols=[1,3])
    

    This will load only the second and fourth columns into the data DataFrame.

    0 讨论(0)
  • 2020-12-31 07:22

    If you rather select column by name, you can use

    data[['value','f']]
    
       value  f
    0    975  5
    1    976  4
    2    977  1
    3    978  0
    4    979  0
    
    0 讨论(0)
提交回复
热议问题