I\'m looking for a method that encodes an string to shortest possible length and lets it be decodable (pure PHP, no SQL). I have working sc
In theory we need a short input character set and a large output character set. I will demonstrate it by the following example. We have the number 2468 as integer with 10 characters (0-9) as character set. We can convert it to the same number with base 2 (binary number system). Then we have a shorter character set (0 and 1) and the result is longer: 100110100100
But if we convert to hexadecimal number (base 16) with a character set of 16 (0-9 and A-F). Then we get a shorter result: 9A4
So in your case we have the following character set for the input:
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
In total 41 characters: Numbers, lower cases and the special chars = / - . &
The character set for output is a bit tricky. We want use URL save characters only. I've grabbed them from here: Characters allowed in GET parameter
So our output character set is (73 characters):
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
Numbers, lower AND upper cases and some special chars.
We have more characters in our set for the output than for the intput. Theory says we can short our input string. CHECK!
Now we need an encode function from base 41 to base 73. For that case I don't know a PHP function. Luckily we can grab the function 'convBase' from here: http://php.net/manual/de/function.base-convert.php#106546 (if someone knows a smarter function let me know)
Now we can short the url. The final code is:
$input = 'img=/dir/dir/hi-res-img.jpg&w=700&h=500';
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz=/-.&";
$outputCharacterSet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~-_.!*'(),$";
$encoded = convBase($input, $inputCharacterSet, $outputCharacterSet);
var_dump($encoded); // string(34) "BhnuhSTc7LGZv.h((Y.tG_IXIh8AR.$!t*"
$decoded = convBase($encoded, $outputCharacterSet, $inputCharacterSet);
var_dump($decoded); // string(39) "img=/dir/dir/hi-res-img.jpg&w=700&h=500"
The encoded string has only 34 characters.
You can optimize the count of characters by
reduce the length of input string. Do you really need the overhead of url parameter syntax? Maybe you can format your string as follows:
$input = '/dir/dir/hi-res-img.jpg,700,500';
This reduces the input itself AND the input character set. Your reduced input character set is then:
$inputCharacterSet = "0123456789abcdefghijklmnopqrstuvwxyz/-.,";
Final output:
string(27) "E$AO.Y_JVIWMQ9BB_Xb3!Th*-Ut"
string(31) "/dir/dir/hi-res-img.jpg,700,500"
reducing the input character set ;-). Maybe you can exclude some more characters? You can encode the numbers to characters first. Then your input character set can be reduced by 10!
increase your output character set. So the given set by me is googled within 2 minutes. Maybe you can use more url save characters. No idea... Maybe someone has a list
Heads up: There is no cryptographically logic in the code. So if somebody guesses the character sets, he can decode the string easily. But you can shuffle the character sets (once). Then it is a bit harder for the attacker, but not really safe. Maybe its enough for your use case anyway.