I have 2 very large arrays (of size ~2,500,000). I need to find difference between these arrays. By difference I mean I need a resultant array with values that are in array
This is the simple algorithm.
As you are working with very large arrays, it'll consume a lot of memory.
Here is my implementation,
$a = file("l.a"); // l.a is a file contains 2,500,000 lines
$b = file("l.b");
function large_array_diff($b, $a){
// Flipping
$at = array_flip($a);
$bt = array_flip($b);
// checking
$d = array_diff_key($bt, $at);
return array_keys($d);
}
I ran it using 4G memory limit. 3G also works. Just tested.
$ time php -d memory_limit=4G diff_la.php
It took about 11 seconds!.
real 0m10.612s
user 0m8.940s
sys 0m1.460s
UPDATE
Following code runs 2x faster than large_array_diff function stated above.
function flip_isset_diff($b, $a) {
$at = array_flip($a);
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
Because it does not call array_flip (1 time), array_diff_key and array_keys. Lots of CPU cycles are saved due to this.