Trying to understand array_diff_uassoc optimization

匿名 (未验证) 提交于 2019-12-03 02:30:02

问题:

It seems that arrays sorted before comparing each other inside array_diff_uassoc.

What is the benefit of this approach?

Test script

function compare($a, $b)     {     echo("$a : $b\n");     return strcmp($a, $b);     }  $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('v' => 1, 'w' => 2, 'x' => 3, 'y' => 4, 'z' => 5); var_dump(array_diff_uassoc($a, $b, 'compare'));   $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('d' => 1, 'e' => 2, 'f' => 3, 'g' => 4, 'h' => 5); var_dump(array_diff_uassoc($a, $b, 'compare'));   $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); var_dump(array_diff_uassoc($a, $b, 'compare'));  $a = array('a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5); $b = array('e' => 5, 'd' => 4, 'c' => 3, 'b' => 2, 'a' => 1); var_dump(array_diff_uassoc($a, $b, 'compare')); 

http://3v4l.org/DKgms#v526

P.S. it seems that sorting algorithm changed in php7.

回答1:

Sorting algorithm didn't change in PHP 7. Elements are just passed in another order to the sorting algorithm for some performance improvements.

Well, benefit could be an eventual faster execution. You really hit worst case when both arrays have completely other keys.

Worst case complexity is twice sorting the arrays and then comparisons of each key of the two arrays. O(n*m + n * log(n) + m * log(m))

Best case is twice sorting and then just as many comparisons as there are elements in the smaller array. O(min(m, n) + n * log(n) + m * log(m))

In case of a match, you wouldn't have to compare against the full array again, but only from the key after the match on.

But in current implementation, the sorting is just redundant. Implementation in php-src needs some improvement I think. There's no outright bug, but implementation is just bad. If you understand some C: http://lxr.php.net/xref/PHP_TRUNK/ext/standard/array.c#php_array_diff (Note that that function is called via php_array_diff(INTERNAL_FUNCTION_PARAM_PASSTHRU, DIFF_ASSOC, DIFF_COMP_DATA_INTERNAL, DIFF_COMP_KEY_USER); from array_diff_uassoc)



回答2:

Theory

Sorting allows for a few shortcuts to be made; for instance:

A      | B -------+------ 1,2,3  | 4,5,6 

Each element of A will only be compared against B[0], because the other elements are known to be at least as big.

Another example:

A      | B -------+------- 4,5,6  | 1,2,6 

In this case, the A[0] is compared against all elements of B, but A[1] and A[2] are compared against B[2] only.

If any element of A is bigger than all elements in B you will get the worst performance.

Practice

While the above works well for the standard array_diff() or array_udiff(), once a key comparison function is used it will resort to O(n * m) performance because of this change while trying to fix this bug.

The aforementioned bug describes how custom key comparison functions can cause unexpected results when used with arrays that have mixed keys (i.e. numeric and string key values). I personally feel that this should've been addressed via the documentation, because you would get equally strange results with ksort().



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!