Consider the following pandas dataframe:
In [114]:
df[\'movie_title\'].head()
Out[114]:
0 Toy Story (1995)
1 GoldenEye (1995)
2 Four Rooms (1
Using regular expressions to find a year stored between parentheses. We specify the parantheses so we don't conflict with movies that have years in their titles
movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))',expand=False)
Removing the parentheses:
movies_df['year'] = movies_df.year.str.extract('(\d\d\d\d)',expand=False)
Removing the years from the 'title' column:
movies_df['title'] = movies_df.title.str.replace('(\(\d\d\d\d\))', '')
Applying the strip function to get rid of any ending whitespace characters that may have appeared:
movies_df['title'] = movies_df['title'].apply(lambda x: x.strip())