问题
I have a dataframe df1 with column name Acc Number as the first column and the data looks like:
Acc Number
ASC100.1
MJT122
ASC120.4
XTY111
I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:
Text Number
ASC 100.1
MJT 122
ASC 100.4
XTY 111
How would I go about doing this?
Thanks!
回答1:
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(\d+\.?\d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
来源:https://stackoverflow.com/questions/53290902/python-pandas-splitting-text-and-numbers-in-dataframe