Finding the row number for the header row in a CSV file / Pandas Dataframe

南楼画角 提交于 2019-12-24 01:16:01

问题


I am trying to get an index or row number for the row that holds the headers in my CSV file. The issue is, the header row can move up and down depending on the output of the report from our system (I have no control to change this)

code:

ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files

If I was to print test, I would expect a number around 4 or 5, and that's what I am feeding into the second read "read_csv"

The error I'm getting is that it's expecting 1 header column, but I have 26 columns. I am just trying to use the first header string to get the row number

Thanks :-)

Edit:

CSV format

This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18

as you will see the "The deadlines" rows are the same, this can be 3 or 5 based on the code ids, thus the header row can change up or down.

I also did not write out all 26 column headers, not sure that matters.

Wanted DF format

index |    code         |   type   | arrived_date | est_del_date
1     | a/wrwgwr12/001  |   kids   |   12-dec-18  | 17-dec-18
2     | aa/gjghgj35/030 |  Pet     |  15-dec-18   | 18-dec-18

Hope this makes sense..

Thanks,


回答1:


You can use the csv module to find the first row which contains a delimiter, then feed the index of this row as the skiprows parameter to pd.read_csv:

from io import StringIO
import csv
import pandas as pd

x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    reader = csv.reader(fin)
    idx = next(idx for idx, row in enumerate(reader) if len(row) > 1)  # 4

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)

print(df)

              code  type arrived_date est_del_date
0   a/wrwgwr12/001  kids    12-dec-18    17-dec-18
1  aa/gjghgj35/030   pet    15-dec-18    18-dec-18


来源:https://stackoverflow.com/questions/53835607/finding-the-row-number-for-the-header-row-in-a-csv-file-pandas-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!