Key error when using regex quantifier python

心已入冬 提交于 2021-01-29 04:41:38

问题


I am trying to capture words following specified stocks in a pandas df. I have several stocks in the format $IBM and am setting a python regex pattern to search each tweet for 3-5 words following the stock if found.

My df called stock_news looks as such:

   Word       Count

0  $IBM     10
1  $GOOGL   8  
etc

pattern = ''
for word in stock_news.Word:
    pattern += '{} (\w+\s*\S*){3,5}|'.format(re.escape(word))

However my understanding is that {} should be a quantifier, in my case matching between 3 to 5 times however I receive the following KeyError:

KeyError: '3,5'

I have also tried using rawstrings with r'{} (\w+\s*\S*){3,5}|' but to no avail. I also tried using this pattern on regex101 and it seems to work there but not in my Pycharm IDE. Any help would be appreciated.

Code for finding:

pat = re.compile(pattern, re.I)

for i in tweet_df.Tweets:
    for x in pat.findall(i):
        print(x)

回答1:


When you build your pattern, there is an empty alternative left at the end, so your pattern effectively matches any string, every empty space before non-matching texts.

You need to build the pattern like

(?:\$IBM|\$GOOGLE)\s+(\w+(?:\s+\S+){3,5})

You may use

pattern = r'(?:{})\s+(\w+(?:\s+\S+){{3,5}})'.format(
              "|".join(map(re.escape, stock_news['Word'])))

Mind that the literal curly braces inside an f-string or a format string must be doubled.

Regex details

  • (?:\$IBM|\$GOOGLE) - a non-capturing group matching either $IBM or $GOOGLE
  • \s+ - 1+ whitespaces
  • (\w+(?:\s+\S+){3,5}) - Capturing group 1 (when using str.findall, only this part will be returned):
    • \w+ - 1+ word chars
    • (?:\s+\S+){3,5} - a non-capturing* group matching three, four or five occurrences of 1+ whitespaces followed with 1+ non-whitespace characters

Note that non-capturing groups are meant to group some patterns, or quantify them, without actually allocating any memory buffer for the values they match, so that you could capture only what you need to return/keep.



来源:https://stackoverflow.com/questions/62133480/key-error-when-using-regex-quantifier-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!