Python pandas splitting text and numbers in dataframe

问题

I have a dataframe df1 with column name Acc Number as the first column and the data looks like:

Acc Number
ASC100.1
MJT122
ASC120.4
XTY111

I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:

Text    Number 
ASC     100.1
MJT     122
ASC     100.4
XTY     111

How would I go about doing this?

Thanks!

回答1:

You could do something like this:

import pandas as pd

data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']

df = pd.DataFrame(data=data, columns=['col'])

result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)

Output

  Text Number
0  ASC  100.1
1  MJT    122
2  ASC  120.4
3  XTY    111

The pattern ([a-zA-Z]+)([^a-zA-Z]+) means match a group of letters: ([a-zA-Z]+) followed by a group of non letters: ([^a-zA-Z]+). A safer alternative will be to use the following regex: ([a-zA-Z]+)(\d+\.?\d+) assuming the numbers can only have at most one point.

Further

The documentation on regex in Python.
The documentation on extract.

来源：https://stackoverflow.com/questions/53290902/python-pandas-splitting-text-and-numbers-in-dataframe

标签

python

pandas

dataframe

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!