问题
I have a simple CSV file that I can't figure out how to pull into a dataframe.
test.csv
h1 h2 h3
11 12 13
h4 h5 h6
14 15 16
As you can see if the csv above was split into two separate files then reading them into a dataframe would be easy. There is a space between each set of data and they are always the same length.
Dataframe I want to create:
h1 h2 h3 h4 h5 h6
11 12 13 14 15 16
回答1:
Less efficient and clever than CT Zhu's solution but maybe a little simpler:
import pandas as pd
from StringIO import StringIO
with open ('foo.csv', 'r') as myfile:
data = myfile.read().split('\n\n')
pieces = [pd.read_csv(StringIO(x),sep=' ') for x in data]
print pd.concat(pieces,axis=1)
h1 h2 h3 h4 h5 h6
0 11 12 13 14 15 16
1 10 10 10 10 10 10
回答2:
That data is surely not in a friendly shape, the following solution should work even if you have more than one rows of data in each section:
In [67]:
%%file temp.csv
h1 h2 h3
11 12 13
10 10 10
h4 h5 h6
14 15 16
10 10 10
Overwriting temp.csv
In [68]:
df=pd.read_csv('temp.csv', sep=' ', header=None)
df=df.dropna()
df.index=df[0].map(lambda x: not x.isdigit()).cumsum()
gp=df.groupby(df.index)
df2=np.hstack([gp.get_group(i) for i in gp.groups])
In [69]:
print pd.DataFrame(df2[1:].astype(float),columns=df2[0])
h1 h2 h3 h4 h5 h6
0 11 12 13 14 15 16
1 10 10 10 10 10 10
[2 rows x 6 columns]
Anyone has better ideas, especially a solution of smaller memory footprint? Here I constructed a new numpy array df2, which certainly means more RAM usage.
来源:https://stackoverflow.com/questions/23165147/read-csv-with-multiple-headers