Here is a regex - attempted by egrep and then by Python 2.7:
$ echo \'/some/path/to/file/abcde.csv\' | egrep \'*([a-zA-Z]+).csv\'
/some/pa
It's not a bug python regex engine use traditional NFA for matching patterns. and character *
just works when precede by a token.
'*'
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.
So instead you can use .*
which repeat any character (.
) :
r'.*([a-zA-Z]+)\.csv'
Also python provide the module fnmatch which support Unix shell-style wildcards.
>>> import fnmatch
>>> s="/some/path/to/file/abcde.csv"
>>> fnmatch.fnmatch(s, '*.csv')
True
You do not need the *
in the pattern, it causes the issue.
Use
([a-zA-Z]+)\.csv
Or to match the whole string:
.*([a-zA-Z]+)\.csv
See demo
The reason is that *
is unescaped and is thus treated as a quantifier. It is applied to the preceding subpattern in the regex. Here, it is used in the beginning of a pattern, and thus cannot quantify nothing. Thus, nothing to repeat is thrown.
If it "works" in VIM, it is just because VIM regex engine ignores this subpattern (same as Java does with unescaped [
and ]
inside a character class like [([)]]
).