Pandas - remove numbers from start of string in series

為{幸葍}努か 提交于 2019-12-02 12:09:55

问题


I've got a series of addresses and would like a series with just the street name. The only catch is some of the addresses don't have a house number, and some do.

So if I have a series that looks like:

Idx
 0      11000 SOUTH PARK
 1      20314 BRAKER LANE
 2      203 3RD ST
 3      BIRMINGHAM PARK
 4      E 12TH

What function would I write to get

Idx
 0      SOUTH PARK
 1      BRAKER LANE
 2      3RD ST
 3      BIRMINGHAM PARK
 4      E 12TH

where any 'words' made entirely of numeric characters at the beginning of the string have been removed? As you can see above, I would like to retain the 3 that '3RD STREET' starts with. I'm thinking a regular expression but this is beyond me. Thanks!


回答1:


You can use str.replace with regex ^\d+\s+ to remove leading digits:

s.str.replace('^\d+\s+', '')

Out[491]:
0         SOUTH PARK
1        BRAKER LANE
2             3RD ST
3    BIRMINGHAM PARK
4             E 12TH
Name: Idx, dtype: object



回答2:


str.replace('\d+\s', '') is what I came up with:

df =  pd.DataFrame({'IDx': ['11000 SOUTH PARK',
                        '20314 BRAKER LANE',
                        '203 3RD ST',
                        'BIRMINGHAM PARK',
                        'E 12TH']})

df
Out[126]: 
                 IDx
0   11000 SOUTH PARK
1  20314 BRAKER LANE
2         203 3RD ST
3    BIRMINGHAM PARK
4             E 12TH

df.IDx = df.IDx.str.replace('\d+\s', '')   

df
Out[128]: 
               IDx
0       SOUTH PARK
1      BRAKER LANE
2           3RD ST
3  BIRMINGHAM PARK
4           E 12TH


来源:https://stackoverflow.com/questions/45600662/pandas-remove-numbers-from-start-of-string-in-series

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!