PHP best way to MD5 multi-dimensional array?

后端 未结 13 1998
别那么骄傲
别那么骄傲 2020-12-07 11:48

What is the best way to generate an MD5 (or any other hash) of a multi-dimensional array?

I could easily write a loop which would traverse through each level of the

相关标签:
13条回答
  • 2020-12-07 12:24
    md5(serialize($array));
    

    Will work, but the hash will change depending on the order of the array (that might not matter though).

    0 讨论(0)
  • 2020-12-07 12:24

    Important note about serialize()

    I don't recommend to use it as part of hashing function because it can return different result for the following examples. Check the example below:

    Simple example:

    $a = new \stdClass;
    $a->test = 'sample';
    
    $b = new \stdClass;
    $b->one = $a;
    $b->two = clone $a;
    

    Produces

    "O:8:"stdClass":2:{s:3:"one";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}s:3:"two";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}}"
    

    But the following code:

    <?php
    
    $a = new \stdClass;
    $a->test = 'sample';
    
    $b = new \stdClass;
    $b->one = $a;
    $b->two = $a;
    

    Output:

    "O:8:"stdClass":2:{s:3:"one";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}s:3:"two";r:2;}"
    

    So instead of second object php just create link "r:2;" to the first instance. It's definitely good and correct way to serialize data, but it can lead to the issues with your hashing function.

    0 讨论(0)
  • 2020-12-07 12:25

    Answer is highly depends on data types of array values. For big strings use:

    md5(serialize($array));
    

    For short strings and integers use:

    md5(json_encode($array));
    

    4 built-in PHP functions can transform array to string: serialize(), json_encode(), var_export(), print_r().

    Notice: json_encode() function slows down while processing associative arrays with strings as values. In this case consider to use serialize() function.

    Test results for multi-dimensional array with md5-hashes (32 char) in keys and values:

    Test name       Repeats         Result          Performance     
    serialize       10000           0.761195 sec    +0.00%
    print_r         10000           1.669689 sec    -119.35%
    json_encode     10000           1.712214 sec    -124.94%
    var_export      10000           1.735023 sec    -127.93%
    

    Test result for numeric multi-dimensional array:

    Test name       Repeats         Result          Performance     
    json_encode     10000           1.040612 sec    +0.00%
    var_export      10000           1.753170 sec    -68.47%
    serialize       10000           1.947791 sec    -87.18%
    print_r         10000           9.084989 sec    -773.04%
    

    Associative array test source. Numeric array test source.

    0 讨论(0)
  • 2020-12-07 12:27

    (Copy-n-paste-able function at the bottom)

    As mentioned prior, the following will work.

    md5(serialize($array));
    

    However, it's worth noting that (ironically) json_encode performs noticeably faster:

    md5(json_encode($array));
    

    In fact, the speed increase is two-fold here as (1) json_encode alone performs faster than serialize, and (2) json_encode produces a smaller string and therefore less for md5 to handle.

    Edit: Here is evidence to support this claim:

    <?php //this is the array I'm using -- it's multidimensional.
    $array = unserialize('a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:4:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}i:3;a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}');
    
    //The serialize test
    $b4_s = microtime(1);
    for ($i=0;$i<10000;$i++) {
        $serial = md5(serialize($array));
    }
    echo 'serialize() w/ md5() took: '.($sTime = microtime(1)-$b4_s).' sec<br/>';
    
    //The json test
    $b4_j = microtime(1);
    for ($i=0;$i<10000;$i++) {
        $serial = md5(json_encode($array));
    }
    echo 'json_encode() w/ md5() took: '.($jTime = microtime(1)-$b4_j).' sec<br/><br/>';
    echo 'json_encode is <strong>'.( round(($sTime/$jTime)*100,1) ).'%</strong> faster with a difference of <strong>'.($sTime-$jTime).' seconds</strong>';
    

    JSON_ENCODE is consistently over 250% (2.5x) faster (often over 300%) -- this is not a trivial difference. You may see the results of the test with this live script here:

    • http://nathanbrauer.com/playground/serialize-vs-json.php
    • http://nathanbrauer.com/playground/plain-text/serialize-vs-json.php

    Now, one thing to note is array(1,2,3) will produce a different MD5 as array(3,2,1). If this is NOT what you want. Try the following code:

    //Optionally make a copy of the array (if you want to preserve the original order)
    $original = $array;
    
    array_multisort($array);
    $hash = md5(json_encode($array));
    

    Edit: There's been some question as to whether reversing the order would produce the same results. So, I've done that (correctly) here:

    • http://nathanbrauer.com/playground/json-vs-serialize.php
    • http://nathanbrauer.com/playground/plain-text/json-vs-serialize.php

    As you can see, the results are exactly the same. Here's the (corrected) test originally created by someone related to Drupal:

    • http://nathanjbrauer.com/playground/drupal-calculation.php
    • http://nathanjbrauer.com/playground/plain-text/drupal-calculation.php

    And for good measure, here's a function/method you can copy and paste (tested in 5.3.3-1ubuntu9.5):

    function array_md5(Array $array) {
        //since we're inside a function (which uses a copied array, not 
        //a referenced array), you shouldn't need to copy the array
        array_multisort($array);
        return md5(json_encode($array));
    }
    
    0 讨论(0)
  • 2020-12-07 12:27

    Aside from Brock's excellent answer (+1), any decent hashing library allows you to update the hash in increments, so you should be able to update with each string sequentially, instead having to build up one giant string.

    See: hash_update

    0 讨论(0)
  • 2020-12-07 12:30

    I think that this could be a good tip:

    Class hasharray {
    
        public function array_flat($in,$keys=array(),$out=array()){
            foreach($in as $k => $v){
                $keys[] = $k; 
                if(is_array($v)){
                    $out = $this->array_flat($v,$keys,$out);
                }else{
                    $out[implode("/",$keys)] = $v;
                }
                array_pop($keys);
            }
            return $out;  
        }
    
        public function array_hash($in){
            $a = $this->array_flat($in);
            ksort($a);
            return md5(json_encode($a));
        }
    
    }
    
    $h = new hasharray;
    echo $h->array_hash($multi_dimensional_array);
    
    0 讨论(0)
提交回复
热议问题