问题
I have to extract phone numbers from free form of texts.
How can I manage it by using reg-ex in python?
I have found for one in order to extract e-mail addresses. https://gist.github.com/dideler/5219706
I have implemented the same approach by using a phone number regex instead of e-mail address regex, but I couldn't get output.
def get_phoneNumber(text):
phone_number = ""
regex = re.compile("((\(\d{3,4}\)|\d{3,4}-)\d{4,9}(-\d{1,5}|\d{0}))|(\d{4,12})")
for phoneNumber in get_phoneNumbers(text, regex):
phone_number = phone_number + phoneNumber + "\n"
return phone_Number
def get_phoneNumbers(s, regex):
return (phoneNumber[0] for phoneNumber in re.findall(regex, s)
How can I manage to do it?
回答1:
This regex matches typical phone numbers from North America
Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555.
\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b
Source: RegexBuddy
The following Python code iterates over all matches
for match in re.finditer(r"\(?\b[2-9][0-9]{2}\)?[-. ]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b", subject):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
What patterns are you expecting?
回答2:
You have to build a pattern to be able to match it with regexp. The question is what is the format you are looking for?
To be able to do this you should do some research on the use-cases how the phone numbers show up.
So I'd expect you to define what are you meaning by matching phone numbers.
- Is it a specific format that you looking for, always consistent through the free text?
- Or can you define the string with a pattern that matches a phone number, by the country code (+xx) and then an specific number of digits?
I just mean that there is a huge difference between: - I want to match phone numbers from a text that can be from any country, mobile or landline, in any format, with random spaces and (,) chars in it or - I want to match phone numbers from Hungary, with a +xx(space)xxxxxxx(space) format, that is always consistent.
Summary: To be able to build a pattern with regexp and use it to match all the phone numbers in your text, you have to be aware of the different representations, meaning what are you expecting a phone number will look like. If your pattern is not correct, you might miss a lot of phone numbers.
Hope this code serves a good cause, V
回答3:
This should find all the phone numbers in a given string including international numbers. Taking the example by @buckley, Lets use the string
text="""Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555 and all combinations thereof, like 333 4445555, (333)4445555 or 333444-5555. Does not match international notation +13334445555, but matches domestic part in +1 333 4445555."""
re.findall(r'+?(?[1-9][0-9 .-()]{8,}[0-9]', text)
>>> re.findall(r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]', text)
['3334445555', '333.444.5555', '333-444-5555', '333 444 5555',
'(333) 444 5555', '333 4445555', '(333)4445555', '333444-5555',
'+13334445555', '+1 333 4445555']
Basically, the regex lays out these rules
- The matched string may start with + or ( symbol
- It has to be followed by a number between 1-9
- It has to end with a number between 0-9
- It may contain 0-9 (space) .-() in the middle.
回答4:
So I think I got a hang of your problem.
This is what I would do in order:
- Learn what reg-ex is, without the foundational knowledge you are just wasting our and your own time.
- Watch this: https://www.youtube.com/watch?v=ZdDOauFIDkw
- Write down what you don't know
- Research
- Write code, provide sample input for your code, copy it to http://pastebin.com, and show it to us, if it's still not working.
- repeat.
来源:https://stackoverflow.com/questions/34527917/extracting-phone-numbers-from-a-free-form-text-in-python-by-using-regex