Extracting a person's age from unstructured text in Python

前端 未结 5 1769
栀梦
栀梦 2021-01-18 23:08

I have a dataset of administrative filings that include short biographies. I am trying to extract people\'s ages by using python and some pattern matching. Some example of s

5条回答
  •  日久生厌
    2021-01-18 23:37

    This will work for all the cases you provided: https://repl.it/repls/NotableAncientBackground

    import re 
    
    input =["Mr Bond, 67, is an engineer in the UK"
    ,"Amanda B. Bynes, 34, is an actress"
    ,"Peter Parker (45) will be our next administrator"
    ,"Mr. Dylan is 46 years old."
    ,"Steve Jones, Age:32,", "Equity awards granted to Mr. Love in 2010 represented 48% of his total compensation",
    "George F. Rubin(14)(15) Age 68 Trustee since: 1997.",
    "INDRA K. NOOYI, 56, has been PepsiCos Chief Executive Officer (CEO) since 2006",
    "Mr. Lovallo, 47, was appointed Treasurer in 2011.",
    "Mr. Charles Baker, 79, is a business advisor to biotechnology companies.",
    "Mr. Botein, age 43, has been a member of our Board since our formation."]
    for i in input:
      age = re.findall(r'Age[\:\s](\d{1,3})', i)
      age.extend(re.findall(r' (\d{1,3}),? ', i))
      if len(age) == 0:
        age = re.findall(r'\((\d{1,3})\)', i)
      print(i+ " --- AGE: "+ str(set(age)))
    

    Returns

    Mr Bond, 67, is an engineer in the UK --- AGE: {'67'}
    Amanda B. Bynes, 34, is an actress --- AGE: {'34'}
    Peter Parker (45) will be our next administrator --- AGE: {'45'}
    Mr. Dylan is 46 years old. --- AGE: {'46'}
    Steve Jones, Age:32, --- AGE: {'32'}
    Equity awards granted to Mr. Love in 2010 represented 48% of his total compensation --- AGE: set()
    George F. Rubin(14)(15) Age 68 Trustee since: 1997. --- AGE: {'68'}
    INDRA K. NOOYI, 56, has been PepsiCos Chief Executive Officer (CEO) since 2006 --- AGE: {'56'}
    Mr. Lovallo, 47, was appointed Treasurer in 2011. --- AGE: {'47'}
    Mr. Charles Baker, 79, is a business advisor to biotechnology companies. --- AGE: {'79'}
    Mr. Botein, age 43, has been a member of our Board since our formation. --- AGE: {'43'}
    

提交回复
热议问题