Extracting phone numbers from a free form text in python by using regex

前端 未结 4 585
遇见更好的自我
遇见更好的自我 2020-12-22 11:49

I have to extract phone numbers from free form of texts.

How can I manage it by using reg-ex in python?

I have found for one in order to extract e-mail addre

相关标签:
4条回答
  • 2020-12-22 12:26

    You have to build a pattern to be able to match it with regexp. The question is what is the format you are looking for?

    To be able to do this you should do some research on the use-cases how the phone numbers show up.

    So I'd expect you to define what are you meaning by matching phone numbers.

    • Is it a specific format that you looking for, always consistent through the free text?
    • Or can you define the string with a pattern that matches a phone number, by the country code (+xx) and then an specific number of digits?

    I just mean that there is a huge difference between: - I want to match phone numbers from a text that can be from any country, mobile or landline, in any format, with random spaces and (,) chars in it or - I want to match phone numbers from Hungary, with a +xx(space)xxxxxxx(space) format, that is always consistent.

    Summary: To be able to build a pattern with regexp and use it to match all the phone numbers in your text, you have to be aware of the different representations, meaning what are you expecting a phone number will look like. If your pattern is not correct, you might miss a lot of phone numbers.

    Hope this code serves a good cause, V

    0 讨论(0)
  • 2020-12-22 12:30

    This regex matches typical phone numbers from North America

    Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555.

    \(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b
    

    Source: RegexBuddy

    The following Python code iterates over all matches

    for match in re.finditer(r"\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b", subject):
        # match start: match.start()
        # match end (exclusive): match.end()
        # matched text: match.group()
    

    What patterns are you expecting?

    0 讨论(0)
  • 2020-12-22 12:35

    So I think I got a hang of your problem.

    This is what I would do in order:

    • Learn what reg-ex is, without the foundational knowledge you are just wasting our and your own time.
    • Watch this: https://www.youtube.com/watch?v=ZdDOauFIDkw
    • Write down what you don't know
    • Research
    • Write code, provide sample input for your code, copy it to http://pastebin.com, and show it to us, if it's still not working.
    • repeat.
    0 讨论(0)
  • 2020-12-22 12:39

    This should find all the phone numbers in a given string including international numbers. Taking the example by @buckley, Lets use the string

    text="""Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555."""

    re.findall(r'+?(?[1-9][0-9 .-()]{8,}[0-9]', text)

     >>> re.findall(r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]', text)
    ['3334445555', '333.444.5555', '333-444-5555', '333 444 5555', 
     '(333) 444 5555', '333 4445555', '(333)4445555', '333444-5555', 
     '+13334445555', '+1 333 4445555']
    

    Basically, the regex lays out these rules

    1. The matched string may start with + or ( symbol
    2. It has to be followed by a number between 1-9
    3. It has to end with a number between 0-9
    4. It may contain 0-9 (space) .-() in the middle.
    0 讨论(0)
提交回复
热议问题