Extracting a person's age from unstructured text in Python

前端 未结 5 1766
栀梦
栀梦 2021-01-18 23:08

I have a dataset of administrative filings that include short biographies. I am trying to extract people\'s ages by using python and some pattern matching. Some example of s

5条回答
  •  耶瑟儿~
    2021-01-18 23:46

    a simple way to find the age of a person from your sentences will be to extract a number with 2 digits:

    import re
    
    sentence = 'Steve Jones, Age: 32,'
    print(re.findall(r"\b\d{2}\b", 'Steve Jones, Age: 32,')[0])
    
    # output: 32
    

    if you do not want % to be at the end of your number and also you want to have a white space in the begening you could do:

    sentence = 'Equity awards granted to Mr. Love in 2010 represented 48% of his total compensation'
    
    match = re.findall(r"\b\d{2}(?!%)[^\d]", sentence)
    
    if match:
        print(re.findall(r"\b\d{2}(?!%)[^\d]", sentence)[0][:2])
    else:
        print('no match')
    
    # output: no match
    

    works well also for the previous sentence

提交回复
热议问题