问题
I've got a series of addresses and would like a series with just the street name. The only catch is some of the addresses don't have a house number, and some do.
So if I have a series that looks like:
Idx
0 11000 SOUTH PARK
1 20314 BRAKER LANE
2 203 3RD ST
3 BIRMINGHAM PARK
4 E 12TH
What function would I write to get
Idx
0 SOUTH PARK
1 BRAKER LANE
2 3RD ST
3 BIRMINGHAM PARK
4 E 12TH
where any 'words' made entirely of numeric characters at the beginning of the string have been removed? As you can see above, I would like to retain the 3 that '3RD STREET' starts with. I'm thinking a regular expression but this is beyond me. Thanks!
回答1:
You can use str.replace
with regex ^\d+\s+ to remove leading digits:
s.str.replace('^\d+\s+', '')
Out[491]:
0 SOUTH PARK
1 BRAKER LANE
2 3RD ST
3 BIRMINGHAM PARK
4 E 12TH
Name: Idx, dtype: object
回答2:
str.replace('\d+\s', '')
is what I came up with:
df = pd.DataFrame({'IDx': ['11000 SOUTH PARK',
'20314 BRAKER LANE',
'203 3RD ST',
'BIRMINGHAM PARK',
'E 12TH']})
df
Out[126]:
IDx
0 11000 SOUTH PARK
1 20314 BRAKER LANE
2 203 3RD ST
3 BIRMINGHAM PARK
4 E 12TH
df.IDx = df.IDx.str.replace('\d+\s', '')
df
Out[128]:
IDx
0 SOUTH PARK
1 BRAKER LANE
2 3RD ST
3 BIRMINGHAM PARK
4 E 12TH
来源:https://stackoverflow.com/questions/45600662/pandas-remove-numbers-from-start-of-string-in-series