How to extract specific content in a pandas dataframe with a regex?

前端 未结 3 1878
Happy的楠姐
Happy的楠姐 2020-12-07 23:35

Consider the following pandas dataframe:

In [114]:

df[\'movie_title\'].head()

​
Out[114]:

0     Toy Story (1995)
1     GoldenEye (1995)
2    Four Rooms (1         


        
3条回答
  •  死守一世寂寞
    2020-12-08 00:15

    Using regular expressions to find a year stored between parentheses. We specify the parantheses so we don't conflict with movies that have years in their titles

    movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))',expand=False)
    

    Removing the parentheses:

    movies_df['year'] = movies_df.year.str.extract('(\d\d\d\d)',expand=False)
    

    Removing the years from the 'title' column:

    movies_df['title'] = movies_df.title.str.replace('(\(\d\d\d\d\))', '')
    

    Applying the strip function to get rid of any ending whitespace characters that may have appeared:

    movies_df['title'] = movies_df['title'].apply(lambda x: x.strip())
    

提交回复
热议问题