Removing substring of from a list of strings

前端 未结 4 1916
长发绾君心
长发绾君心 2021-01-29 07:02

There are several countries with numbers and/or parenthesis in my list. How I remove these?

e.g.

\'Bolivia (Plurinational State of)\' should be \'Bolivi

4条回答
  •  半阙折子戏
    2021-01-29 07:48

    Run just:

    df.Country.replace(r'\d+|\s*\([^)]*\)', '', regex=True, inplace=True)
    

    Assuming that the initial content of your DataFrame is:

                                Country
    0  Bolivia (Plurinational State of)
    1                     Switzerland17
    2                    United Kingdom
    

    after the above replace you will have:

              Country
    0         Bolivia
    1     Switzerland
    2  United Kingdom
    

    The above pattern contains:

    • the first alternative - a non-empty sequence of digits,
    • the second alternative:
      • an optional sequence of "white" chars,
      • an opening parenthesis (quoted),
      • a sequence of chars other than ) (between brackets no quotation is needed),
      • a closing parenthesis (also quoted).

提交回复
热议问题