I am reading multiple sheets of an excel file using pandas in python. I have three cases
o
I would propose the following algorithm:
This code works okay for me:
import pandas as pd
for sheet in range(3):
raw_data = pd.read_excel('blank_rows.xlsx', sheetname=sheet, header=None)
print(raw_data)
# looking for the header row
for i, row in raw_data.iterrows():
if row.notnull().all():
data = raw_data.iloc[(i+1):].reset_index(drop=True)
data.columns = list(raw_data.iloc[i])
break
# transforming columns to numeric where possible
for c in data.columns:
data[c] = pd.to_numeric(data[c], errors='ignore')
print(data)
It uses this toy data sample, based on your examples. From the raw dataframes
0 1 2
0 Country Company Product
1 US ABC XYZ
2 US ABD XYY
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
0 1 2
0 Product summary table for East region NaN NaN
1 Date: 1st Sep, 2016 NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
the script produces the same table
Country Company Product
0 US ABC XYZ
1 US ABD XYY