Shortest possible encoded string with decode possibility (shorten url) using only PHP

前端 未结 13 1728
甜味超标
甜味超标 2020-12-28 19:14

I\'m looking for a method that encodes an string to shortest possible length and lets it be decodable (pure PHP, no SQL). I have working sc

13条回答
  •  挽巷
    挽巷 (楼主)
    2020-12-28 19:56

    Theory

    In theory we need a short input character set and a large output character set. I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer: 100110100100

    But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result: 9A4

    Practice

    So in your case we have the following character set for the input:

    $inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
    

    In total 41 characters: Numbers, lower cases and the special chars = / - . &

    The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter

    So our output character set is (73 characters):

    $outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
    

    Numbers, lower AND upper cases and some special chars.

    We have more characters in our set for the output than for the intput. Theory says we can short our input string. CHECK!

    Coding

    Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: http://php.net/manual/de/function.base-convert.php#106546 (if someone knows a smarter function let me know)

    Now we can short the url. The final code is:

    $input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
    $inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
    $outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
    $encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
    var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
    $decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
    var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
    

    The encoded string has only 34 characters.

    Optimizations

    You can optimize the count of characters by

    • reduce the length of input string. Do you really need the overhead of url parameter syntax? Maybe you can format your string as follows:

      $input = '/dir/dir/hi-res-img.jpg,700,500';

      This reduces the input itself AND the input character set. Your reduced input character set is then:

      $inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";

      Final output:

      string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"

      string(31) "/dir/dir/hi-res-img.jpg,700,500"

    • reducing the input character set ;-). Maybe you can exclude some more characters? You can encode the numbers to characters first. Then your input character set can be reduced by 10!

    • increase your output character set. So the given set by me is googled within 2 minutes. Maybe you can use more url save characters. No idea... Maybe someone has a list

    Security

    Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe its enough for your use case anyway.

提交回复
热议问题