Python pandas splitting text and numbers in dataframe

夙愿已清 提交于 2021-02-05 07:42:24

问题


I have a dataframe df1 with column name Acc Number as the first column and the data looks like:

Acc Number
ASC100.1
MJT122
ASC120.4
XTY111

I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:

Text    Number 
ASC     100.1
MJT     122
ASC     100.4
XTY     111

How would I go about doing this?

Thanks!


回答1:


You could do something like this:

import pandas as pd

data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']

df = pd.DataFrame(data=data, columns=['col'])

result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)

Output

  Text Number
0  ASC  100.1
1  MJT    122
2  ASC  120.4
3  XTY    111

The pattern ([a-zA-Z]+)([^a-zA-Z]+) means match a group of letters: ([a-zA-Z]+) followed by a group of non letters: ([^a-zA-Z]+). A safer alternative will be to use the following regex: ([a-zA-Z]+)(\d+\.?\d+) assuming the numbers can only have at most one point.

Further

  1. The documentation on regex in Python.
  2. The documentation on extract.


来源:https://stackoverflow.com/questions/53290902/python-pandas-splitting-text-and-numbers-in-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!