Is there a .NET equivalent to SQL Server's newsequentialid()

前端 未结 10 2172
孤城傲影
孤城傲影 2020-12-02 10:47

We use GUIDs for primary key, which you know is clustered by default.

When inserting a new row into a table it is inserted at a random page in the table (because GUID

相关标签:
10条回答
  • 2020-12-02 11:22

    It's important to note that the UUIDs generated by UuidCreateSequential will not be sequential when ordered by SQL Server.

    • SQL Server follows the RFC when it comes to sorting UUIDs
    • the RFC got it wrong
    • UuidCreateSequential did it right
    • but UuidCreateSequential creates something different from what SQL Server expects

    Background

    The Type 1 UUIDs created by UuidCreateSequential don't sort in SQL Server.

    SQL Server's NewSequentialID uses UuidCreateSequential, with some byte shuffling applied. From the Books Online:

    NEWSEQUENTIALID (Transact-SQL)

    NEWSEQUENTIALID is a wrapper over the Windows UuidCreateSequential function, with some byte shuffling applied

    which then references an MSDN blog post:

    How to Generate Sequential GUIDs for SQL Server in .NET (archive)

    public static Guid NewSequentialId()
    {
       Guid guid;
       UuidCreateSequential(out guid);
       var s = guid.ToByteArray();
       var t = new byte[16];
    
       t[3] = s[0];
       t[2] = s[1];
       t[1] = s[2];
       t[0] = s[3];
    
       t[5] = s[4];
       t[4] = s[5];
       t[7] = s[6];
       t[6] = s[7];
       t[8] = s[8];
       t[9] = s[9];
       t[10] = s[10];
       t[11] = s[11];
       t[12] = s[12];
       t[13] = s[13];
       t[14] = s[14];
       t[15] = s[15];
    
       return new Guid(t);
    }
    

    It all starts with the number of ticks since 1582-10-15 00:00:00 (October 15, 1592, the date of Gregorian reform to the Christian calendar). Ticks is the number of 100 ns intervals.

    For example:

    • 12/6/2017 4:09:39 PM UTC
    • = 137,318,693,794,503,714 ticks
    • = 0x01E7DA9FDCA45C22 ticks

    The RFC says that we should split this value into three chunks:

    • UInt32 low (4 bytes)
    • Uint16 mid (2 bytes)
    • UInt32 hi (2 bytes)

    So we split it up:

    0x01E7DA9FDCA45C22
    
    |   Hi   |   Mid  |    Low     |
    |--------|--------|------------|
    | 0x01E7 | 0xDA9F | 0xDCA45C22 |
    

    And then the RFC says that these three integers should be written out in the order of:

    • Low: 0xDCA45C22
    • Mid: 0xDA9F
    • High: 0x01E7

    If you follow the RFC, these values must be written in big-endian (aka "network byte order"):

    DC A4 5C 22 DA 9F x1 E7 xx xx xx xx xx xx xx xx
    

    This was a bad design, because you cannot take the first 8 bytes of the UUID and treat them either as a big-endian UInt64, nor as a little-endian UInt64. It's a totally dumb encoding.

    UuidCreateSequential gets it right

    Microsoft followed all the same rules so far:

    • Low: 0xDCA45C22
    • Mid: 0xDA9F
    • High: 0x1E7

    But they write it out in Intel little-endian order:

    22 5C A4 DC 9F DA E7 x1 xx xx xx xx xx xx xx xx
    

    If you look at that, you've just written out a little-endian Int64:

    225CA4DC9FDAE701
    

    Meaning:

    • if you wanted to extract the timestamp
    • or sort by the timestamp

    it's trivial; just treat the first 8 bytes as a UInt64.

    With the RFC, you have no choice but to perform all kinds of bit fiddling. Even on big-endian machines, you can't treat the 64-bit timestamp as a 64-bit timestamp.

    How to reverse it

    Given a little-endian guid from UuidCreateSequential:

    DCA45C22-DA9F-11E7-DDDD-FFFFFFFFFFFF
    

    with the raw bytes of:

    22 5C A4 DC 9F DA E7 11 DD DD FF FF FF FF FF FF
    

    This decodes into:

    Low      Mid  Version High
    -------- ---- ------- ---- -----------------
    DCA45C22-DA9F-1       1E7 -DDDD-FFFFFFFFFFFF
    
    • Low: 0xDCA45C22
    • Mid: 0xDA9F
    • High: 0x1E7
    • Version: 1 (type 1)

    We can write this back out in RFC big-endian order:

    DC A4 5C 22 DA 9F 11 E7 DD DD FF FF FF FF FF FF
    

    Short version

                   |   Swap      | Swap  | Swap  | Copy as-is
    Start index    |  0  1  2  3 |  4  5 |  6  7 | 
    End index      |  3  2  1  0 |  5  4 |  7  6 | 
    ---------------|-------------|-------|-------|------------------------ 
    Little-endian: | 22 5C A4 DC | 9F DA | E7 11 | DD DD FF FF FF FF FF FF
    Big-endian:    | DC A4 5C 22 | DA 9F | 11 E7 | DD DD FF FF FF FF FF FF
    
    0 讨论(0)
  • 2020-12-02 11:24

    Perhaps a simple way to determine the order in which rows have been added would be to add an IDENTITY column to the table, avoiding the need to keep your GUIDS in order and hence avoiding the performance hit of maintaining a clustered index on the GUID column.

    I can't help but wonder how keeping these rows in order helps you when debugging. Could you expand that a bit?

    0 讨论(0)
  • 2020-12-02 11:25

    Unfortunatley, no there isn't a .NET equivalent to newsequentialid(). You could continue using a Comb. I actually have a C# implementation of a Comb somewhere...I'll see if I can dig it up.

    0 讨论(0)
  • 2020-12-02 11:25

    I've been lead to believe that random Guids can be beneficial to performance in some use cases. Apparently inserting to random pages can avoid contention that would otherwise occur in the end page when multiple people are trying to insert at the same time.

    John's PInvoke suggestions is probably the closest to SQL's version but the UUidCreateSequential docs state that you shouldn't use it to identify an object that it's strictly local to the machine generating the Guid.

    I'd be measuring the actual use case performance hit with realistic data in realistic quantities before I investigated sequential Guid generation any further.

    0 讨论(0)
提交回复
热议问题