Read CSV with multiple Headers

问题

I have a simple CSV file that I can't figure out how to pull into a dataframe.

test.csv

As you can see if the csv above was split into two separate files then reading them into a dataframe would be easy. There is a space between each set of data and they are always the same length.

Dataframe I want to create:

h1 h2 h3 h4 h5 h6  
11 12 13 14 15 16

回答1:

Less efficient and clever than CT Zhu's solution but maybe a little simpler:

import pandas as pd
from StringIO import StringIO

with open ('foo.csv', 'r') as myfile:
    data = myfile.read().split('\n\n')

pieces = [pd.read_csv(StringIO(x),sep=' ') for x in data]
print pd.concat(pieces,axis=1)

   h1  h2  h3  h4  h5  h6
0  11  12  13  14  15  16
1  10  10  10  10  10  10

回答2:

That data is surely not in a friendly shape, the following solution should work even if you have more than one rows of data in each section:

In [67]:

%%file temp.csv
h1 h2 h3
11 12 13
10 10 10

h4 h5 h6
14 15 16
10 10 10
Overwriting temp.csv
In [68]:

df=pd.read_csv('temp.csv', sep=' ', header=None)
df=df.dropna()
df.index=df[0].map(lambda x: not x.isdigit()).cumsum()
gp=df.groupby(df.index)
df2=np.hstack([gp.get_group(i) for i in gp.groups])
In [69]:

print pd.DataFrame(df2[1:].astype(float),columns=df2[0])
   h1  h2  h3  h4  h5  h6
0  11  12  13  14  15  16
1  10  10  10  10  10  10

[2 rows x 6 columns]

Anyone has better ideas, especially a solution of smaller memory footprint? Here I constructed a new numpy array df2, which certainly means more RAM usage.

来源：https://stackoverflow.com/questions/23165147/read-csv-with-multiple-headers

标签

csv

pandas