问题
Say I start with a Series of unformatted phone numbers (as strings), and I would like to format them as (XXX) YYY-ZZZZ.
I can get the sub-components of my input using regular expressions and str.match or str.extract. And I can perform the formatting using the result of either:
ser = pd.Series(data=['1234567890', '2345678901', '3456789012'])
matched = ser.str.match(r'(\d{3})(\d{3})(\d{4})')
extracted = ser.astype(str).str.extract(r'(?P<first>\d{3})(?P<second>\d{3})(?P<third>\d{4})')
formatmatched = matched.apply(lambda x: '({0}) {1}-{2}'.format(*x))
print 'formatmatched'
print formatmatched
formatextracted = extracted.apply(lambda x: '({first}) {second}-{third}'.format(**x.to_dict()), axis=1)
print 'formatextracted'
print formatextracted
Results:
formatmatched
0 (123) 456-7890
1 (234) 567-8901
2 (345) 678-9012
dtype: object
formatextracted
0 (123) 456-7890
1 (234) 567-8901
2 (345) 678-9012
dtype: object
Is there a vectorized way to apply that formatting command in either context?
回答1:
You can do this directly with Series.str.replace():
In [47]: s = pandas.Series(["1234567890", "5552348866", "13434"])
In [49]: s
Out[49]:
0 1234567890
1 5552348866
2 13434
dtype: object
In [50]: s.str.replace(r"(\d{3})(\d{3})(\d{4})", r"(\1) \2-\3")
Out[50]:
0 (123) 456-7890
1 (555) 234-8866
2 13434
dtype: object
You could also imagine doing another transformation first to remove any non-digit characters.
回答2:
Why don't you try this:
import pandas as pd
ser = pd.Series(data=['1234567890', '2345678901', '3456789012'])
def f(val):
return '({0}) {1}-{2}'.format(val[:3],val[3:6],val[6:])
print ser.apply(f)
来源:https://stackoverflow.com/questions/22077328/vectorized-format-function-for-pandas-series