import text to pandas with multiple delimiters

大城市里の小女人 提交于 2019-12-17 07:28:28

问题


I have some data that looks like this:

c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3

I want to import it into a 3 column data frame, with columns e.g.

a , b, c
1,  1, 0.5
etc

I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?

currently:

data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])

However this is further complicated by the fact my data has a leading space...

I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.


回答1:


One way might be to use the regex separators permitted by the python engine. For example:

>>> !cat castle.dat
c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'], 
                     sep=' |:', engine='python')
>>> df
   a  b    c
0  1  1  0.5
1  1  2  6.5
2  1  3  5.3


来源:https://stackoverflow.com/questions/26551662/import-text-to-pandas-with-multiple-delimiters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!