Algorithm to turn numeric IDs in to short, different alphanumeric codes

天涯浪子 提交于 2020-01-15 03:56:13

问题


I have IDs from a database, and I want them to be short and easily differentiatable by eye (i.e., two close numbers look different).

Like this:

13892359163211 -> ALO2WE7
13992351216421 -> 52NBEK3

or similar, algorithmically. So kind of like a hash, except it needs to be reversible? An encryption algorithm like AES is almost ideal, except that its outputs are way too long. (and overkill).

I'm using Python (3), although I don't think that should really matter


回答1:


New answer with 'close' numbers looking different

You could use RSA to encrypt (and later decrypt) your numbers. This is definitely overkill - but ... here is the example: Install https://github.com/sybrenstuvel/python-rsa (pip install rsa)

import rsa
import rsa.core
# (pubkey, privkey) = rsa.newkeys(64) # Generate key pair
pubkey = rsa.PublicKey(n=9645943279888986023, e=65537)
privkey = rsa.PrivateKey(n=9645943279888986023, e=65537, d=7507666207464026273, p=9255782423, q=1042153201)

print("1st", rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))
print("2nd", rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))
print("1st", hex(rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))[2:])
print("2nd", hex(rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))[2:])

# If you want to compare a couple of numbers that are similar
for i in range (13892359163211, 13892359163251):
  encrypted = rsa.core.encrypt_int(i, pubkey.e, pubkey.n)
  # decrypted = rsa.core.decrypt_int(encrypted, privkey.d, privkey.n)
  print (i, hex(encrypted)[2:], encrypted)

Please not that you cannot encrypt numbers bigger than pubkey.n. This is a RSA related limitation. By generating a different keypair with a higher n you can circumvent this issue. If you would like all generated numbers to have the same length, prefix them with leading zeroes. You could also consider making them uppercase for better readability. To make the displayed strings shorter consider using the base62 encoding mentioned in my old answer below.

output

1st 5427392181794576250
2nd 7543432434424555966
1st 4b51f86f0c99177a
2nd 68afa7d5110929be

input          hex(encrypted)   encrypted
13892359163211 4b51f86f0c99177a 5427392181794576250
13892359163212 2039f9a3f5cf5d46 2322161565485194566
13892359163213 173997b57918a6c3 1673535542221383363
13892359163214 36644663653bbb4  244958435527080884
13892359163215 c2eeec0c054e633  877901489011746355
...

Old answer related to displaying the numbers a bit shorter, not being aware that they should look substantially different

You want to change the base of your number from 10 to something bigger to use less characters. See https://stackoverflow.com/a/1119769 for an example with base 62 (a-zA-Z0-9).

Or quick and dirty for base 16, (0-9A-F, hexadecimal).

hex(13892359163211)[2:] # -> 'ca291220d4b'



回答2:


The problem is easier to state than it is to solve. One solution is to borrow some ideas from format-preserving encryption, but simplifying because security is not a goal. Using the Feistel cipher framework a very short and reversible "mixing" function can be written, followed by a short encoding function, to achieve something that appears to be what you want.

import hashlib
import string

mask = (1 << 22) - 1
alphabet = string.ascii_uppercase + string.digits


def func(x: int):
    return int.from_bytes(hashlib.sha256(x.to_bytes(3, 'big')).digest(), 'big') & mask


def mix(id_in: int):
    L, R = id_in >> 22, id_in & mask
    L ^= func(R)
    R ^= func(L)
    return (L << 22) | R


def unmix(mixed: int):
    L, R = mixed >> 22, mixed & mask
    R ^= func(L)
    L ^= func(R)
    return (L << 22) | R


def base_n_encode(value: int):
    digits = []
    for i in range(9):
        value, rem = divmod(value, len(alphabet))
        digits.insert(0, rem)
    return ''.join(alphabet[digit] for digit in digits)


def base_n_decode(encoded: str):
    digits = [alphabet.index(ch) for ch in encoded]
    result = 0
    for digit in digits:
        result = result * len(alphabet) + digit
    return result


def encode(id_in: int):
    return base_n_encode(mix(id_in))


def decode(encoded: str):
    return unmix(base_n_decode(encoded))


if __name__ == '__main__':
    e1 = encode(13892359163211)
    e2 = encode(13992351216421)
    print('13892359163211 -> ' + e1)
    print('13992351216421 -> ' + e2)
    print(e1 + ' -> ' + str(decode(e1)))
    print(e2 + ' -> ' + str(decode(e2)))

Output is:

13892359163211 -> BC33VXN8A
13992351216421 -> D1UOW6SLL
BC33VXN8A -> 13892359163211
D1UOW6SLL -> 13992351216421

Note the use of sha256. This is slow and most definitely overkill, but it has the advantage of being built-in to python and thus a one-liner. Unless you are converting millions of IDs speed shouldn't be an issue, but if it is you can replace func with something much, much faster, maybe Murmur3.

The code is written with hard-coded constants to make it a little easier to see what's going on, but it can be generalized to work with arbitrary length (in bits) IDs and arbitrary alphabets.

A more general version of this example is available on github.




回答3:


How about finding crc32 for the input and showing the result in hex?

>>> n = 13892359163211
>>> 
>>> import binascii
>>> hex(binascii.crc32(str(n).encode()))[2:]
'240a831a'



回答4:


Convert the numeric ID's to binary form (3) and use an encoder (4, 5).

In [1]: import struct, base64

In [2]: i = 13892359163211
Out[2]: 13892359163211

In [3]: struct.pack('L', i)
Out[3]: b'K\r"\x91\xa2\x0c\x00\x00'

In [4]: base64.b85encode(struct.pack('L', i)).decode('ascii')
Out[4]: 'OAR8Cq6`24'

In [5]: base64.b64encode(struct.pack('L', i)).decode('ascii')[:-1]
Out[5]: 'Sw0ikaIMAAA'

Which encoder to use depends on which characters you want to allow.




回答5:


You can use CrypII idea to convert from integer to base64. This will be the shortest

  • 13892359163211 is 4LWL and
  • 13992351216421 is 64yl


来源:https://stackoverflow.com/questions/57624017/algorithm-to-turn-numeric-ids-in-to-short-different-alphanumeric-codes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!