Hashing (hiding) strings in Python

前端 未结 4 1724
南方客
南方客 2020-12-15 08:14

What I need is to hash a string. It doesn\'t have to be secure because it\'s just going to be a hidden phrase in the text file (it just doesn\'t have to be recognizable for

4条回答
  •  死守一世寂寞
    2020-12-15 08:44

    First off, let me say that you can't guarantee unique results. If you wanted unique results for all the strings in the universe, you're better off storing the string itself (or a compressed version).

    More on that in a second. Let's get some hashes first.

    hashlib way

    You can use any of the main cryptographic hashes to hash a string with a few steps:

    >>> import hashlib
    >>> sha = hashlib.sha1("I am a cat")
    >>> sha.hexdigest()
    '576f38148ae68c924070538b45a8ef0f73ed8710'
    

    You have a choice between SHA1, SHA224, SHA256, SHA384, SHA512, and MD5 as far as built-ins are concerned.

    What's the difference between those hash algorithms?

    A hash function works by taking data of variable length and turning it into data of fixed length.

    The fixed length, in the case of each of the SHA algorithms built into hashlib, is the number of bits specified in the name (with the exception of sha1 which is 160 bits). If you want better certainty that two strings won't end up in the same bucket (same hash value), pick a hash with a bigger digest (the fixed length).

    In sorted order, these are the digest sizes you have to work with:

    Algorithm  Digest Size (in bits)
    md5        128
    sha1       160
    sha224     224
    sha256     256
    sha384     384
    sha512     512
    

    The bigger the digest the less likely you'll have a collision, provided your hash function is worth its salt.

    Wait, what about hash()?

    The built in hash() function returns integers, which could also be easy to use for the purpose you outline. There are problems though.

    >>> hash('moo')
    6387157653034356308
    
    1. If your program is going to run on different systems, you can't be sure that hash will return the same thing. In fact, I'm running on a 64-bit box using 64-bit Python. These values are going to be wildly different than for 32-bit Python.

    2. For Python 3.3+, as @gnibbler pointed out, hash() is randomized between runs. It will work for a single run, but almost definitely won't work across runs of your program (pulling from the text file you mentioned).

    Why would hash() be built that way? Well, the built in hash is there for one specific reason. Hash tables/dictionaries/look up tables in memory. Not for cryptographic use but for cheap lookups at runtime.

    Don't use hash(), use hashlib.

提交回复
热议问题