php preg_grep and umlaut/accent

问题

I have an array that consists of terms, some of them contain accented characters. I do a preg grep like this

$data= array('Napoléon','Café');
$result = preg_grep('~' . $input . '~i', $data);

So if user type in 'le' I would also want the result 'Napoléon' to be matched, which does not work with the ablove command.

I did some searching and found that this function might be relevant

preg_match("/[\w\pL]/u",$var);

How can I combine these and make it work?

回答1:

This is not possible with a regular expression pattern only. It is not because you can not tell the regex engine to match all "e" and similars. However, it is possible to first normalize the input data (both the array as well as the search input) and then search the normalized data but return the results for the non-normalized data.

In the following example I use transliteration to do this kind of normalization, I guess that is what you're looking for:

$data = ['Napoléon', 'Café'];

$result = array_translit_search('le', $data);
print_r($result);

$result = array_translit_search('leó', $data);
print_r($result);

The exemplary output is:

Array
(
    [0] => Napoléon
)
Array
(
    [0] => Napoléon
)

The search function itself is rather straight forward as written above, transliterating the inputs, doing the preg_grep and then returning the original intputs matches:

/**
 * @param string $search
 * @param array $data
 * @return array
 */
function array_translit_search($search, array $data) {

    $transliterator = Transliterator::create('ASCII-Latin', Transliterator::REVERSE);
    $normalize      = function ($string) use ($transliterator) {

        return $transliterator->transliterate($string);
    };

    $dataTrans   = array_map($normalize, $data);
    $searchTrans = $normalize($search);
    $pattern     = sprintf('/%s/i', preg_quote($searchTrans));
    $result      = preg_grep($pattern, $dataTrans);
    return array_intersect_key($data, $result);
}

This code requires the Transliterator from the Intl extension, you can replace it with any other similar transliteration or translation function.

I can not suggest to use str_replace here btw., if you need to fall-back to a translation table, use strtr instead. That is what you're looking for. But I prefer a library that brings the translation with it's own, especially if it's the Intl lib, you normally can't beat it.

回答2:

You can write something like this:

$data = array('Napoléon','Café');
// do something with your input, but for testing purposes it will be simply as you wrote in your example
$input = 'le';

foreach($data as $var) {
  if (preg_match("/".str_replace(array("é"....), array("e"....), $input)."/i", str_replace(array("é"....), array("e"....), $var))) 
    //do something as there is a match
}

Actually you even don't need regex in this case, simple strpos will be enough.

来源：https://stackoverflow.com/questions/14072333/php-preg-grep-and-umlaut-accent

标签

php

regex

character-encoding

transliteration