Extract salaries from a list of strings

后端 未结 1 704
春和景丽
春和景丽 2020-12-18 00:49

I\'m trying to extract salaries from a list of strings. I\'m using the regex findall() function but it\'s returning many empty strings as well as the salaries and this is c

相关标签:
1条回答
  • 2020-12-18 01:16

    Using re.findall will give you the capturing groups when you use them in your pattern and you are using a group where almost everything is optional giving you the empty strings in the result.

    In your pattern you use [0-9]* which would match 0+ times a digit. If there is not limit to the leading digits, you might use [0-9]+ instead to not make it optional.

    You might use this pattern with a capturing group:

    (?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)
    

    Regex demo | Python demo

    Explanation

    • (?<!\S) Assert what is on the left is not a non whitespace character
    • ( Capture group
      • [0-9]+(?: [0-9]{1,3})? match 1+ digits followed by an optional part that matches a space and 1-3 digits
    • ) Close capture group
    • Match literally
    • (?!\S) Assert what is on the right is not a non whitespace character

    Your code might look like:

    import re
    sal= '41 000€ à 63 000€ / an' #this is a sample string for which i have errors
    regex = '(?<!\S)([0-9]+(?: [0-9]{1,3})?)€(?!\S)'
    print(re.findall(regex,sal))  # ['41 000', '63 000']
    
    0 讨论(0)
提交回复
热议问题