Read CSV with multiple Headers

帅比萌擦擦* 提交于 2021-01-27 17:18:44

问题


I have a simple CSV file that I can't figure out how to pull into a dataframe.

test.csv

h1 h2 h3
11 12 13

h4 h5 h6
14 15 16

As you can see if the csv above was split into two separate files then reading them into a dataframe would be easy. There is a space between each set of data and they are always the same length.

Dataframe I want to create:

h1 h2 h3 h4 h5 h6  
11 12 13 14 15 16  

回答1:


Less efficient and clever than CT Zhu's solution but maybe a little simpler:

import pandas as pd
from StringIO import StringIO

with open ('foo.csv', 'r') as myfile:
    data = myfile.read().split('\n\n')

pieces = [pd.read_csv(StringIO(x),sep=' ') for x in data]
print pd.concat(pieces,axis=1)

   h1  h2  h3  h4  h5  h6
0  11  12  13  14  15  16
1  10  10  10  10  10  10



回答2:


That data is surely not in a friendly shape, the following solution should work even if you have more than one rows of data in each section:

In [67]:

%%file temp.csv
h1 h2 h3
11 12 13
10 10 10

h4 h5 h6
14 15 16
10 10 10
Overwriting temp.csv
In [68]:

df=pd.read_csv('temp.csv', sep=' ', header=None)
df=df.dropna()
df.index=df[0].map(lambda x: not x.isdigit()).cumsum()
gp=df.groupby(df.index)
df2=np.hstack([gp.get_group(i) for i in gp.groups])
In [69]:

print pd.DataFrame(df2[1:].astype(float),columns=df2[0])
   h1  h2  h3  h4  h5  h6
0  11  12  13  14  15  16
1  10  10  10  10  10  10

[2 rows x 6 columns]

Anyone has better ideas, especially a solution of smaller memory footprint? Here I constructed a new numpy array df2, which certainly means more RAM usage.



来源:https://stackoverflow.com/questions/23165147/read-csv-with-multiple-headers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!