I\'m trying to get a grasp on regular expressions and I came across with the one included inside the str.extract
method:
movies[\'year\']=movies
Try using this:
movies['year']= movies['title'].str.extract('.*\((\d{4})\).*',expand=False)
First of all, the behavior of Pandas .str.extract() is quite expected: it returns only the capturing group contents. The pattern used with extract
requires at least 1 capturing group:
pat : string
Regular expression pattern with capturing groups
If you use a named capturing group, the new column will be named after the named group.
The grep
command you provided can be reduced to
grep '\((.*)\)'
as grep
is capable of matching a line partially (does not require a full line match) and works on a per line basis: once a match is found the whole line is returned. To override that behavior, you may use -o
switch.
With grep
, you cannot return the capturing group contents. This can be worked around with PCRE regexp powered with -P
option, but it is not available on Mac, for example. sed
or awk
may help in those situations, too.