Expected behavior with regular expressions with capturing-groups in pandas' `str.extract()`

前端 未结 2 2091
离开以前
离开以前 2020-12-07 04:14

I\'m trying to get a grasp on regular expressions and I came across with the one included inside the str.extract method:

movies[\'year\']=movies         


        
2条回答
  •  被撕碎了的回忆
    2020-12-07 05:11

    First of all, the behavior of Pandas .str.extract() is quite expected: it returns only the capturing group contents. The pattern used with extract requires at least 1 capturing group:

    pat : string
    Regular expression pattern with capturing groups

    If you use a named capturing group, the new column will be named after the named group.

    The grep command you provided can be reduced to

    grep '\((.*)\)'
    

    as grep is capable of matching a line partially (does not require a full line match) and works on a per line basis: once a match is found the whole line is returned. To override that behavior, you may use -o switch.

    With grep, you cannot return the capturing group contents. This can be worked around with PCRE regexp powered with -P option, but it is not available on Mac, for example. sed or awk may help in those situations, too.

提交回复
热议问题