Applying Conditions on Pandas DataFrame Columns before reading csv or tsv files

问题

Is it possible to set conditions (filters) for the DataFrame columns before reading a csv or tsv files, If I am already aware of the column names and types? If yes, how?

For Example: Consider there are two numerical columns (col1 and col2) in a very big file. I do not want to load whole file in the memory and select only those rows where col1 greater than col2. Therefore, first, I want to set the condition on the dataframe that it should read only those rows from the csv file where col1 is greater than col2. I hope my explanation make sense.

Thanks

回答1:

You can use blaze for this which is a handy tool to have alongside pandas.

Let's assume an input file of:

a,b
1,2
3,4
5,3
3,6
6,1

We then open the file and query the data - note that the query isn't executed until you attempt to materialise/access it:

import blaze
import pandas as pd

csv_data = blaze.Data('input.csv')
query = csv_data[csv_data['a'] > csv_data['b']]
df = pd.DataFrame.from_records(query, columns=query.fields)

That then gives df as:

   a  b
0  5  3
1  6  1

来源：https://stackoverflow.com/questions/39351036/applying-conditions-on-pandas-dataframe-columns-before-reading-csv-or-tsv-files

标签

python

pandas

dataframe

conditional-statements

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!