How to handle diacritics (accents) when rewriting 'pretty URLs'

前端 未结 6 800
长情又很酷
长情又很酷 2020-11-30 09:48

I rewrite URLs to include the title of user generated travelblogs.

I do this for both readability of URLs and SEO purposes.

 http://www.example.com/gall         


        
6条回答
  •  無奈伤痛
    2020-11-30 10:30

    Nice topic, I had the same problem a while ago.
    Here's how I fixed it:

    function title2url($string=null){
     // return if empty
     if(empty($string)) return false;
    
     // replace spaces by "-"
     // convert accents to html entities
     $string=htmlentities(utf8_decode(str_replace(' ', '-', $string)));
    
     // remove the accent from the letter
     $string=preg_replace(array('@&([a-zA-Z]){1,2}(acute|grave|circ|tilde|uml|ring|elig|zlig|slash|cedil|strok|lig){1};@', '@&[euro]{1};@'), array('${1}', 'E'), $string);
    
     // now, everything but alphanumeric and -_ can be removed
     // aso remove double dashes
     $string=preg_replace(array('@[^a-zA-Z0-9\-_]@', '@[\-]{2,}@'), array('', '-'), html_entity_decode($string));
    }
    

    Here's how my function works:

    1. Convert it to html entities
    2. Strip the accents
    3. Remove all remaining weird chars

提交回复
热议问题