Best way to find differences between two large arrays in PHP

后端 未结 2 907
Happy的楠姐
Happy的楠姐 2020-12-25 07:58

I have 2 very large arrays (of size ~2,500,000). I need to find difference between these arrays. By difference I mean I need a resultant array with values that are in array

2条回答
  •  梦谈多话
    2020-12-25 08:57

    This is the simple algorithm.

    1. Flip 1st array. Values will become keys. So repeated values will be discarded.
    2. Flip 2nd array (optional)
    3. Check for each element in 2nd array if it exists in 1st array.

    As you are working with very large arrays, it'll consume a lot of memory.

    Here is my implementation,

    $a = file("l.a"); // l.a is a file contains 2,500,000 lines
    $b = file("l.b");
    
    function large_array_diff($b, $a){
        // Flipping 
        $at = array_flip($a);
        $bt = array_flip($b); 
        // checking
        $d = array_diff_key($bt, $at);
    
        return array_keys($d);   
    }
    

    I ran it using 4G memory limit. 3G also works. Just tested.

    $ time php -d memory_limit=4G diff_la.php
    

    It took about 11 seconds!.

    real    0m10.612s
    user    0m8.940s
    sys     0m1.460s
    

    UPDATE

    Following code runs 2x faster than large_array_diff function stated above.

    function flip_isset_diff($b, $a) {
        $at = array_flip($a);
        $d = array();
        foreach ($b as $i)
            if (!isset($at[$i])) 
                $d[] = $i;
    
        return $d;
    }
    

    Because it does not call array_flip (1 time), array_diff_key and array_keys. Lots of CPU cycles are saved due to this.

提交回复
热议问题