问题
Is it possible to set conditions (filters) for the DataFrame columns before reading a csv or tsv files, If I am already aware of the column names and types? If yes, how?
For Example: Consider there are two numerical columns (col1 and col2) in a very big file. I do not want to load whole file in the memory and select only those rows where col1 greater than col2. Therefore, first, I want to set the condition on the dataframe that it should read only those rows from the csv file where col1 is greater than col2. I hope my explanation make sense.
Thanks
回答1:
You can use blaze for this which is a handy tool to have alongside pandas
.
Let's assume an input file of:
a,b
1,2
3,4
5,3
3,6
6,1
We then open the file and query the data - note that the query isn't executed until you attempt to materialise/access it:
import blaze
import pandas as pd
csv_data = blaze.Data('input.csv')
query = csv_data[csv_data['a'] > csv_data['b']]
df = pd.DataFrame.from_records(query, columns=query.fields)
That then gives df
as:
a b
0 5 3
1 6 1
来源:https://stackoverflow.com/questions/39351036/applying-conditions-on-pandas-dataframe-columns-before-reading-csv-or-tsv-files