Hash function that produces short hashes?

后端未结

关注

 10  1688

Is there a way of encryption that can take a string of any length and produce a sub-10-character hash? I want to produce reasonably unique ID\'s but based on message content

相关标签:

10条回答

孤独总比滥情好

2020-12-08 00:14
Simply run this in a terminal (on MacOS or Linux):
```
crc32 <(echo "some string")
```
8 characters long.
0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-08 00:15

You could use an existing hash algorithm that produces something short, like MD5 (128 bits) or SHA1 (160). Then you can shorten that further by XORing sections of the digest with other sections. This will increase the chance of collisions, but not as bad as simply truncating the digest.

Also, you could include the length of the original data as part of the result to make it more unique. For example, XORing the first half of an MD5 digest with the second half would result in 64 bits. Add 32 bits for the length of the data (or lower if you know that length will always fit into fewer bits). That would result in a 96-bit (12-byte) result that you could then turn into a 24-character hex string. Alternately, you could use base 64 encoding to make it even shorter.

0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2020-12-08 00:18

If you don't need an algorithm that's strong against intentional modification, I've found an algorithm called adler32 that produces pretty short (~8 character) results. Choose it from the dropdown here to try it out:

http://www.sha1-online.com/

0 讨论(0)
发布评论:

提交评论
- 加载中...

悲哀的现实

2020-12-08 00:18

I needed something along the lines of a simple string reduction function recently. Basically, the code looked something like this (C/C++ code ahead):

size_t ReduceString(char *Dest, size_t DestSize, const char *Src, size_t SrcSize, bool Normalize)
{
    size_t x, x2 = 0, z = 0;

    memset(Dest, 0, DestSize);

    for (x = 0; x < SrcSize; x++)
    {
        Dest[x2] = (char)(((unsigned int)(unsigned char)Dest[x2]) * 37 + ((unsigned int)(unsigned char)Src[x]));
        x2++;

        if (x2 == DestSize - 1)
        {
            x2 = 0;
            z++;
        }
    }

    // Normalize the alphabet if it looped.
    if (z && Normalize)
    {
        unsigned char TempChr;
        y = (z > 1 ? DestSize - 1 : x2);
        for (x = 1; x < y; x++)
        {
            TempChr = ((unsigned char)Dest[x]) & 0x3F;

            if (TempChr < 10)  TempChr += '0';
            else if (TempChr < 36)  TempChr = TempChr - 10 + 'A';
            else if (TempChr < 62)  TempChr = TempChr - 36 + 'a';
            else if (TempChr == 62)  TempChr = '_';
            else  TempChr = '-';

            Dest[x] = (char)TempChr;
        }
    }

    return (SrcSize < DestSize ? SrcSize : DestSize);
}

It probably has more collisions than might be desired but it isn't intended for use as a cryptographic hash function. You might try various multipliers (i.e. change the 37 to another prime number) if you get too many collisions. One of the interesting features of this snippet is that when Src is shorter than Dest, Dest ends up with the input string as-is (0 * 37 + value = value). If you want something "readable" at the end of the process, Normalize will adjust the transformed bytes at the cost of increasing collisions.

Source:

https://github.com/cubiclesoft/cross-platform-cpp/blob/master/sync/sync_util.cpp

0 讨论(0)

忘掉有多难

2020-12-08 00:20
It is now 2019 and there are better options. Namely, xxhash.
```
~ echo test | xxhsum                                                           
2d7f1808da1fa63c  stdin
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-08 00:21
Just summarizing an answer that was helpful to me (noting @erasmospunk's comment about using base-64 encoding). My goal was to have a short string that was mostly unique...

I'm no expert, so please correct this if it has any glaring errors (in Python again like the accepted answer):
```
import base64
import hashlib
import uuid

unique_id = uuid.uuid4()
# unique_id = UUID('8da617a7-0bd6-4cce-ae49-5d31f2a5a35f')

hash = hashlib.sha1(str(unique_id).encode("UTF-8"))
# hash.hexdigest() = '882efb0f24a03938e5898aa6b69df2038a2c3f0e'

result = base64.b64encode(hash.digest())
# result = b'iC77DySgOTjliYqmtp3yA4osPw4='
```
The result here is using more than just hex characters (what you'd get if you used hash.hexdigest()) so it's less likely to have a collision (that is, should be safer to truncate than a hex digest).

Note: Using UUID4 (random). See http://en.wikipedia.org/wiki/Universally_unique_identifier for the other types.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页