Searching for phone numbers in mysql

断了今生、忘了曾经 提交于 2019-11-30 07:13:35

This looks like a problem from the start. Any kind of searching you do will require a table scan and we all know that's bad.

How about adding a column with a hash of the current phone numbers after stripping out all formatting characters. Then you can at least index the hash values and avoid a full blown table scan.

Or is the amount of data small and not expected to grow much? Then maybe just sucking all the numbers into the client and running a search there.

Nihal

I know this is ancient history, but I found it while looking for a similar solution.

A simple REGEXP may work:

select * from phone_table where phone1 REGEXP "07[^0-9]*123[^0-9]*456"

This would match the phonenumber column with or without any separating characters.

An out-of-the-box idea, but could you use a "replace" function to strip out any instances of "(", "-" and " ", and then use an "isnumeric" function to test whether the resulting string is a number?

Then you could do the same to the phone number string you're searching for and compare them as integers.

Of course, this won't work for numbers like 1800-MATT-ROCKS. :)

My solution would be something along the lines of what John Dyer said. I'd add a second column (e.g. phoneStripped) that gets stripped on insert and update. Index this column and search on it (after stripping your search term, of course).

You could also add a trigger to automatically update the column, although I've not worked with triggers. But like you said, it's really difficult to write the MySQL code to strip the strings, so it's probably easier to just do it in your client code.

(I know this is late, but I just started looking around here :)

Michael Bagryantcev

i suggest to use php functions, and not mysql patterns, so you will have some code like this:

$tmp_phone = '';
for ($i=0; $i < strlen($phone); $i++)
   if (is_numeric($phone[$i]))
       $tmp_phone .= '%'.$phone[$i];
$tmp_phone .= '%';
$search_condition .= " and phone LIKE '" . $tmp_phone . "' ";
crono

This is a problem with MySQL - the regex function can match, but it can't replace. See this post for a possible solution.

Is it possible to run a query to reformat the data to match a desired format and then just run a simple query? That way even if the initial reformatting is slow you it doesn't really matter.

See

http://www.mfs-erp.org/community/blog/find-phone-number-in-database-format-independent

It is not really an issue that the regular expression would become visually appalling, since only mysql "sees" it. Note that instead of '+' (cfr. post with [\D] from the OP) you should use '*' in the regular expression.

Some users are concerned about performance (non-indexed search), but in a table with 100000 customers, this query, when issued from a user interface returns immediately, without noticeable delay.

MySQL can search based on regular expressions.

Sure, but given the arbitrary formatting, if my haystack contained "(027) 123 456" (bear in mind position of spaces can change, it could just as easily be 027 12 3456 and I wanted to match it with 027123456, would my regex therefore need to be this?

"^[\D]+0[\D]+2[\D]+7[\D]+1[\D]+2[\D]+3[\D]+4[\D]+5[\D]+6$"

(actually it'd be worse as the mysql manual doesn't seem to indicate it supports \D)

If that is the case, isn't it more or less the same as my %%%%% idea?

Just an idea, but couldn't you use Regex to quickly strip out the characters and then compare against that like @Matt Hamilton suggested?

Maybe even set up a view (not sure of mysql on views) that would hold all phone numbers stripped by regex to a plain phone number?

Woe is me. I ended up doing this:

mre = mobile_number && ('%' + mobile_number.gsub(/\D/, '').scan(/./m).join('%'))

find(:first, :conditions => ['trim(mobile_phone) like ?', mre])

if this is something that is going to happen on a regular basis perhaps modifying the data to be all one format and then setup the search form to strip out any non-alphanumeric (if you allow numbers like 310-BELL) would be a good idea. Having data in an easily searched format is half the battle.

a possible solution can be found at http: //udf-regexp.php-baustelle.de/trac/

additional package need to be installed, then you can play with REGEXP_REPLACE

Create a user defined function to dynamically creates Regex.

DELIMITER //

CREATE FUNCTION udfn_GetPhoneRegex
(   
    var_Input VARCHAR(25)
)
RETURNS VARCHAR(200)

BEGIN
    DECLARE iterator INT          DEFAULT 1;
    DECLARE phoneregex VARCHAR(200)          DEFAULT '';

    DECLARE output   VARCHAR(25) DEFAULT '';


   WHILE iterator < (LENGTH(var_Input) + 1) DO
      IF SUBSTRING(var_Input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
         SET output = CONCAT(output, SUBSTRING(var_Input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
   END WHILE;
    SET output = RIGHT(output,10);
    SET iterator = 1;
    WHILE iterator < (LENGTH(output) + 1) DO
         SET phoneregex = CONCAT(phoneregex,'[^0-9]*',SUBSTRING(output, iterator, 1));
         SET iterator = iterator + 1;
    END WHILE;
    SET phoneregex = CONCAT(phoneregex,'$');
   RETURN phoneregex;
END//
DELIMITER ;

Call that User Defined Function in your stored procedure.

DECLARE var_PhoneNumberRegex        VARCHAR(200);
SET var_PhoneNumberRegex = udfn_GetPhoneRegex('+ 123 555 7890');
SELECT * FROM Customer WHERE phonenumber REGEXP var_PhoneNumberRegex;

I would use Google's libPhoneNumber to format a number to E164 format. I would add a second column called "e164_number" to store the e164 formatted number and add an index on it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!