I have a very large CSV File with 100 columns. In order to illustrate my problem I will use a very basic example.
Let\'s suppose that we have a CSV file.
As Wai Yip Tung said, you can filter your dataframe while reading by specifying the name of the columns, for example:
import pandas as pd
data = pd.read_csv("ThisFile.csv")[['value','d']]
This solved my problem.
This selects the second and fourth columns (since Python uses 0-based indexing):
In [272]: df.iloc[:,(1,3)]
Out[272]:
value f
0 975 5
1 976 4
2 977 1
3 978 0
4 979 0
[5 rows x 2 columns]
df.ix
can select by location or label. df.iloc
always selects by location. When indexing by location use df.iloc
to signal your intention more explicitly. It is also a bit faster since Pandas does not have to check if your index is using labels.
Another possibility is to use the usecols
parameter:
data = pandas.read_csv("ThisFile.csv", usecols=[1,3])
This will load only the second and fourth columns into the data
DataFrame.
If you rather select column by name, you can use
data[['value','f']]
value f
0 975 5
1 976 4
2 977 1
3 978 0
4 979 0