What conditions cause CHECKSUM_AGG to return 0?

好久不见. 提交于 2019-12-13 17:30:28

问题


It seems that there are a number of conditions that cause CHECKSUM_AGG to return 0 that I wouldn't have expected. I've only been able to find one discussed, which is that duplicate values will cause it. That can be solved via a DISTINCT or GROUP BY.

I've also found a couple more scenarios that make less sense to me. One was provided by my boss and one I found MSDN. These I don't know how to explain. Here is some SQL that demonstrates the scenarios:

SELECT
    CHECKSUM_AGG(T.Number) AS ChecksumAgggregate
FROM
(
    VALUES
        (2)
        , (3)
        , (4)
        , (5)
)AS T(Number)

DECLARE @t TABLE 
(
    Category VARCHAR(15),
    Value VARCHAR(10)
)

INSERT @t 
(
    Category
    , Value
)
VALUES 
    ('OneCharacter','a')
    ,('OneCharacter','b')
    ,('OneCharacter','c')
    ,('OneCharacter','d')

    ,('TwoCharacters','aa')
    ,('TwoCharacters','bb')
    ,('TwoCharacters','cc')
    ,('TwoCharacters','dd')

    ,('ThreeCharacters','aaa')
    ,('ThreeCharacters','bbb')
    ,('ThreeCharacters','ccc')
    ,('ThreeCharacters','ddd')

    ,('SixCharacters','aaaaaa')
    ,('SixCharacters','bbbbbb')
    ,('SixCharacters','cccccc')
    ,('SixCharacters','dddddd')

    ,('AllValues','a')
    ,('AllValues','b')
    ,('AllValues','c')
    ,('AllValues','d')
    ,('AllValues','aa')
    ,('AllValues','bb')
    ,('AllValues','cc')
    ,('AllValues','dd')
    ,('AllValues','aaa')
    ,('AllValues','bbb')
    ,('AllValues','ccc')
    ,('AllValues','ddd')
    ,('AllValues','aaaaaa')
    ,('AllValues','bbbbbb')
    ,('AllValues','cccccc')
    ,('AllValues','dddddd')

select 
    Category, CHECKSUM_AGG(CHECKSUM(Value)) 
from @t 
group by Category

select Category, Value, CHECKSUM(Value) ValueChecksum
from @t

It's nothing but 0's in these examples from CHECKSUM_AGG from these queries. The last query shows that none of the CHECKSUM values that are getting input into the CHECKSUM_AGG call are duplicated.

I'm hoping that whatever answer describes what causes CHECKSUM_AGG to return 0 will explain these situations as well.


回答1:


This can happen if, for example, there is an even number of rows, or if the values sum up to certain values (e.g. 14). These both yield 0:

SELECT CHECKSUM_AGG(N)
FROM (VALUES (1),(2),(3),(4),(4)) AS T(N);

SELECT CHECKSUM_AGG(N)
FROM (VALUES (2),(3),(4),(5)) AS T(N);

But these do not:

SELECT CHECKSUM_AGG(N)
FROM (VALUES (1),(2),(3),(4),(5)) AS T(N);

SELECT CHECKSUM_AGG(N)
FROM (VALUES (2),(3),(4),(6)) AS T(N);

14 may just be a coincidence (and it doesn't happen for that value across 6 rows), I mention it only because it's the only pattern I noticed here. I don't know that any of these cases are documented.

Also you are aware that CHECKSUM_AGG is officially documented as being not reliable, i.e. it may not accurately reflect a data change, right? The doc doesn't go into details on this, it just states that sometimes it won't tell you there's been a change even if there has. (The actual wording is "However, there is a small chance that the checksum will not change.")

Anyway Michael Swart seems to have discovered that CHECKSUM_AGG is just XOR (here is his proof):

CREATE TABLE #f(a FLOAT);
GO

INSERT #f VALUES (RAND());
GO 20

DECLARE @i INT = 0;
SELECT @i = @i ^ CHECKSUM(a) FROM #f;
SELECT @i, CHECKSUM_AGG(CHECKSUM(a)) FROM #f;
GO

DROP TABLE #f;
GO

As he goes on to explain, this can lead to misleading results when the range of values is very small, since the odds of generating a reliable checksum diminishes rapidly.



来源:https://stackoverflow.com/questions/11994430/what-conditions-cause-checksum-agg-to-return-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!