When I retrieve a list of items from a database including some children (via .Include), and order the randomly, EF gives me an unexpected result.. I creates/clones addition
tl;dr: There's a leaky abstraction here. To us, Include
is a simple instruction to stick a collection of things onto each single returned Person
row. But EF's implementation of Include
is done by returning a whole row for each Person-Address
combo, and reassembling at the client. Ordering by a volatile value causes those rows to become shuffled, breaking apart the Person
groups that EF is relying on.
When we have a look at ToTraceString()
for this LINQ:
var people = c.People.Include("Addresses");
// Note: no OrderBy in sight!
we see
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
So we get n
rows for each A
, plus 1
row for each P
without any A
s.
Adding an OrderBy
clause, however, puts the thing-to-order-by at the start of the ordered columns:
var people = c.People.Include("Addresses").OrderBy(p => Guid.NewGuid());
gives
SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[C2] AS [C1],
[Project1].[Id1] AS [Id1],
[Project1].[Data] AS [Data],
[Project1].[PersonId] AS [PersonId]
FROM ( SELECT
NEWID() AS [C1],
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent2].[PersonId] AS [PersonId],
[Extent2].[Data] AS [Data],
CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
FROM [Person] AS [Extent1]
LEFT OUTER JOIN [Address] AS [Extent2] ON [Extent1].[Id] = [Extent2].[PersonId]
) AS [Project1]
ORDER BY [Project1].[C1] ASC, [Project1].[Id] ASC, [Project1].[C2] ASC
So in your case, where the ordered-by-thing is not a property of a P
, but is instead volatile, and therefore can be different for different P-A
records of the same P
, the whole thing falls apart.
I'm not sure where on the working-as-intended ~~~ cast-iron bug
continuum this behaviour falls. But at least now we know about it.