Regular expression - any text to URL friendly one

后端 未结 5 944
一个人的身影
一个人的身影 2020-12-14 23:32

PHP regular expression script to remove anything that is not a alphabetical letter or number 0 to 9 and replace space to a hyphen - change to lowercase make sure there is on

相关标签:
5条回答
  • 2020-12-14 23:43
    $str = preg_replace('/[^a-zA-Z0-9]/', '-', $str);
    
    0 讨论(0)
  • 2020-12-14 23:46

    In a function:

    function sanitize_text_for_urls ($str) 
    {
        return trim( strtolower( preg_replace(
            array('/[^a-z0-9-\s]/ui', '/\s/', '/-+/'),
            array('', '-', '-'),
            iconv('UTF-8', 'ASCII//TRANSLIT', $str) )), '-');
    }
    

    What it does:

    // Solve accents and diacritics
    $str = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
    
    // Leave only alphanumeric (respect existing hyphens)
    $str = preg_replace('/[^a-z0-9-\s]/ui', '', $str);
    
    // Turn spaces to hyphens
    $str = preg_replace('/\s+/', '-', $str);
    
    // Remove duplicate hyphens
    $str = preg_replace('/-+/', '-', $str);
    
    // Remove trailing hyphens
    $str = trim($str, '-');
    
    // Turn to lowercase
    $str = strtolower($str);
    

    Note:
    You can combine multiple preg_replace by passing an array. See the function at the top.

    For example:

    // Électricité, plâtrerie    -->  electricite-platrerie
    // St. Lücie-Pétêrès         -->  st-lucie-peteres
    // -Façade- & gros œuvre     -->  facade-gros-oeuvre
    
    // _-Thè quîck ~`!@#&$%^ &*()_+= ---{}|][ :"; <>?.,/ fóx - jümpëd_-
    // the-quick-fox-jumped
    

    EDIT: added "/u" at the end of the regex to use UTF8
    EDIT: accounted for duplicated and leading/trailing hyphens, thanks to @LuBre

    0 讨论(0)
  • 2020-12-14 23:48

    Since you seem to want all sequences of non-alphanumeric characters being replaced by a single hyphen, you can use this:

    $str = preg_replace('/[^a-zA-Z0-9]+/', '-', $str);
    

    But this can result in leading or trailing hyphens that can be removed with trim:

    $str = trim($str, '-');
    

    And to convert the result into lowercase, use strtolower:

    $str = strtolower($str);
    

    So all together:

    $str = strtolower($str);
    $str = trim($str, '-');
    $str = preg_replace('/[^a-z0-9]+/', '-', $str);
    

    Or in a compact one-liner:

    $str = strtolower(trim(preg_replace('/[^a-zA-Z0-9]+/', '-', $str), '-'));
    
    0 讨论(0)
  • 2020-12-14 23:53

    If you're using this for filenames in PHP, the answer by Gumbo would be

    $str = preg_replace('/[^a-zA-Z0-9.]+/', '-', $str);
    $str = trim($str, '-');
    $str = strtolower($str);
    

    Added a period for file names and it's strtolower(), not strtolowercase().

    0 讨论(0)
  • 2020-12-15 00:00

    I was just working with something similar, and I came up with this little piece of code, it also contemplates the use of latin characters.

    This is the sample string:

    $str = 'El veloz murciélago hindú comía fe<!>&@#$%&!"#%&?¡?*liz cardillo y kiwi. La cigüeña ¨^;.-|°¬tocaba el saxofón detrás del palenque de paja';

    First I convert the string to htmlentities just to make it easier to use later.

    $friendlyURL = htmlentities($str, ENT_COMPAT, "UTF-8", false);

    Then I replace latin characters with their corresponding ascii characters (á becomes a, Ü becomes U, and so on):

    $friendlyURL = preg_replace('/&([a-z]{1,2})(?:acute|circ|lig|grave|ring|tilde|uml|cedil|caron);/i','\1',$friendlyURL);

    Then I convert the string back from html entities to symbols, again for easier use later.

    $friendlyURL = html_entity_decode($friendlyURL,ENT_COMPAT, "UTF-8");

    Next I replace all non alphanumeric characters into hyphens.

    $friendlyURL = preg_replace('/[^a-z0-9-]+/i', '-', $friendlyURL);

    I remove extra hyphens inside the string:

    $friendlyURL = preg_replace('/-+/', '-', $friendlyURL);

    I remove leading and trailing hyphens:

    $friendlyURL = trim($friendlyURL, '-');

    And finally convert all into lowercase:

    $friendlyURL = strtolower($friendlyURL);

    All together:

    function friendlyUrl ($str = '') {
    
        $friendlyURL = htmlentities($str, ENT_COMPAT, "UTF-8", false); 
        $friendlyURL = preg_replace('/&([a-z]{1,2})(?:acute|circ|lig|grave|ring|tilde|uml|cedil|caron);/i','\1',$friendlyURL);
        $friendlyURL = html_entity_decode($friendlyURL,ENT_COMPAT, "UTF-8"); 
        $friendlyURL = preg_replace('/[^a-z0-9-]+/i', '-', $friendlyURL);
        $friendlyURL = preg_replace('/-+/', '-', $friendlyURL);
        $friendlyURL = trim($friendlyURL, '-');
        $friendlyURL = strtolower($friendlyURL);
        return $friendlyURL;
    
    }
    

    Test:

    $str = 'El veloz murciélago hindú comía fe<!>&@#$%&!"#%&-?¡?*-liz cardillo y kiwi. La cigüeña ¨^`;.-|°¬tocaba el saxofón detrás del palenque de paja';
    
    echo friendlyUrl($str);
    

    Outcome:

    el-veloz-murcielago-hindu-comia-fe-liz-cardillo-y-kiwi-la-ciguena-tocaba-el-saxofon-detras-del-palenque-de-paja
    

    I guess Gumbo's answer fits your problem better, and it's a shorter code, but I thought it would be useful for others.

    Cheers, Adriana

    0 讨论(0)
提交回复
热议问题