How evenly spread are the first four bytes of a Guid created in .NET?

你说的曾经没有我的故事 提交于 2019-12-18 07:03:12

问题


There is a good deal of info on GUIDs on the net and StackOverflow. Indeed endless questions about uniqueness. This is not a question about 2^128 uniqueness.

My question is to determine just how random the first section, secifically the first four bytes of the GUID is in .NET . Based on research, it is supposedly the least significant 32 bits of timestamp. But how is timestamp converted? Just how random is this?

Does anybody know how the first section is constructed by .NET and if is truly evenly spread in 4 bytes ?

How is the timestamp used to construct the first 32 bits?

How does clock precision affect it?

Was any attempt made by Microsoft to make sure the the first 4 bytes tends to random or not?

WHY: High volume Guid use has 2 main business cases for Good random guids in the first 4 bytes. If you have an even spread for each new GUID, then you can use table partitioning based on the first 1,2,3 or 4 bytes based on how many partitions you need. I have seen a 2 billion row table with 10 million inserts a day, with 128 partitions using the first 2 bytes as partition key. NOTE under DB2 the the first part of the key had to be used. Quote DB2 DBA. This greatly improved throughput on the DB. The second use is batch job parallel key allocation. If you know you have approximately N rows as a batch task, you can allocate key ranges to parallel jobs. Without a homogenous split, the dispatcher must first calculate the from and to keys for each job. If that means reading 100 millions and managing them in memory just to dispatch work, the first x minutes is lost to job dispatch. In the example I have seen it was around 15 mins. So there a 2 excellent reasons to use and want Evenly spread GUIds.

The SAP Banking system actually introduced a custom GUID routine to resolve the lack randomness in the first Section of the GUID. For those with access to an SAP banking system, the Function is BANK_DISTRIBUTED_ID_CREATE. the comments in the code explain why they did it. Those with access to SAP support there is a note 496904 explains why they see it necessary to fix guids.

Prior to the custom routine there were clear skews in the GUIDs under AIX. C++ kernel. Unique yes, but random , especially the first section, clearly not.

Update: As I decided to write a program to investigate: .net 4 on Windows XP, Dell Intel Core 2 Duo.

I have included the TEST PROGRAM RESULTS incase if interest. Guid generated using

var G = Guid.NewGuid();

The results look OK on SAMPLE 100,000,000 guids.(larger set still running) For my purposes, that looks evenly spread enough to assume OK.

Byte 0: with Value 6A was least frequent : 389140 times
Byte 0: with Value 58 was most  frequent : 392241 times
Byte 1: with Value 25 was least frequent : 388905 times
Byte 1: with Value B3 was most  frequent : 392552 times
Byte 2: with Value D2 was least frequent : 389114 times
Byte 2: with Value CC was most  frequent : 391984 times
Byte 3: with Value 66 was least frequent : 388744 times
Byte 3: with Value 16 was most  frequent : 392838 times

edit: background research added based on comments

I have seen samples of GUIDs on a AIX system. We have over 2 billion already. They are NOT evenly spread. There a noticeable skews in the 2 bytes. As a result a special routine was introduced to generate homogenous guids. I was wondering if .net had a similar skew


回答1:


The Guids appear to be evenly spread. Tests on 1 billion Guids look good. If considering the first 4 bytes. Which mean they are useful for partitions and ranges can be roughly deduced rather than read from Db.



来源:https://stackoverflow.com/questions/13149139/how-evenly-spread-are-the-first-four-bytes-of-a-guid-created-in-net

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!