问题
I have a Linq query that basically counts how many entries were created on a particular day, which is done by grouping by year, month, day. The problem is that because some days won't have any entries I need to back fill those missing "calendar days" with an entry of 0 count. My guess is that this can probably be done with a Union or something, or maybe even some simple for loop to process the records after the query.
Here is the query:
from l in context.LoginToken
where l.CreatedOn >= start && l.CreatedOn <= finish
group l by
new{l.CreatedOn.Year, l.CreatedOn.Month, l.CreatedOn.Day} into groups
orderby groups.Key.Year , groups.Key.Month , groups.Key.Day
select new StatsDateWithCount {
Count = groups.Count(),
Year = groups.Key.Year,
Month = groups.Key.Month,
Day = groups.Key.Day
}));
If I have data for 12/1 - 12/4/2009 like (simplified):
12/1/2009 20
12/2/2009 15
12/4/2009 16
I want an entry with 12/3/2009 0 added by code.
I know that in general this should be done in the DB using a denormalized table that you either populate with data or join to a calendar table, but my question is how would I accomplish this in code?
Can it be done in Linq? Should it be done in Linq?
回答1:
I just did this today. I gathered the complete data from the database and then generated a "sample empty" table. Finally, I did an outer join of the empty table with the real data and used the DefaultIfEmpty() construct to deal with knowing when a row was missing from the database to fill it in with defaults.
Here's my code:
int days = 30;
// Gather the data we have in the database, which will be incomplete for the graph (i.e. missing dates/subsystems).
var dataQuery =
from tr in SourceDataTable
where (DateTime.UtcNow - tr.CreatedTime).Days < 30
group tr by new { tr.CreatedTime.Date, tr.Subsystem } into g
orderby g.Key.Date ascending, g.Key.SubSystem ascending
select new MyResults()
{
Date = g.Key.Date,
SubSystem = g.Key.SubSystem,
Count = g.Count()
};
// Generate the list of subsystems we want.
var subsystems = new[] { SubSystem.Foo, SubSystem.Bar }.AsQueryable();
// Generate the list of Dates we want.
var datetimes = new List<DateTime>();
for (int i = 0; i < days; i++)
{
datetimes.Add(DateTime.UtcNow.AddDays(-i).Date);
}
// Generate the empty table, which is the shape of the output we want but without counts.
var emptyTableQuery =
from dt in datetimes
from subsys in subsystems
select new MyResults()
{
Date = dt.Date,
SubSystem = subsys,
Count = 0
};
// Perform an outer join of the empty table with the real data and use the magic DefaultIfEmpty
// to handle the "there's no data from the database case".
var finalQuery =
from e in emptyTableQuery
join realData in dataQuery on
new { e.Date, e.SubSystem } equals
new { realData.Date, realData.SubSystem } into g
from realDataJoin in g.DefaultIfEmpty()
select new MyResults()
{
Date = e.Date,
SubSystem = e.SubSystem,
Count = realDataJoin == null ? 0 : realDataJoin.Count
};
return finalQuery.OrderBy(x => x.Date).AsEnumerable();
回答2:
Essentially what I ended up doing here is creating a list of the same type with all the dates in the range and 0 value for the count. Then union the results from my original query with this list. The major hurdle was simply creating a custom IEqualityComparer. For more details here: click here
回答3:
You can generate the list of dates starting from "start" and ending at "finish", a then step by step check the number of count for each date separately
回答4:
I made a helper function which is designed to be used with anonymous types, and reused in as generic way as possible.
Let's say this is your query to get a list of orders for each date.
var orders = db.Orders
.GroupBy(o => o.OrderDate)
.Select(o => new
{
OrderDate = o.Key,
OrderCount = o.Count(),
Sales = o.Sum(i => i.SubTotal)
}
.OrderBy(o => o.OrderDate);
For my function to work please note this list must be ordered by date. If we had a day with no sales there would be a hole in the list.
Now for the function that will fill in the blanks with a default value (instance of anonymous type).
private static IEnumerable<T> FillInEmptyDates<T>(IEnumerable<DateTime> allDates, IEnumerable<T> sourceData, Func<T, DateTime> dateSelector, Func<DateTime, T> defaultItemFactory)
{
// iterate through the source collection
var iterator = sourceData.GetEnumerator();
iterator.MoveNext();
// for each date in the desired list
foreach (var desiredDate in allDates)
{
// check if the current item exists and is the 'desired' date
if (iterator.Current != null &&
dateSelector(iterator.Current) == desiredDate)
{
// if so then return it and move to the next item
yield return iterator.Current;
iterator.MoveNext();
// if source data is now exhausted then continue
if (iterator.Current == null)
{
continue;
}
// ensure next item is not a duplicate
if (dateSelector(iterator.Current) == desiredDate)
{
throw new Exception("More than one item found in source collection with date " + desiredDate);
}
}
else
{
// if the current 'desired' item doesn't exist then
// create a dummy item using the provided factory
yield return defaultItemFactory(desiredDate);
}
}
}
The usage is as follows:
// first you must determine your desired list of dates which must be in order
// determine this however you want
var desiredDates = ....;
// fill in any holes
var ordersByDate = FillInEmptyDates(desiredDates,
// Source list (with holes)
orders,
// How do we get a date from an order
(order) => order.OrderDate,
// How do we create an 'empty' item
(date) => new
{
OrderDate = date,
OrderCount = 0,
Sales = 0
});
- Must make sure there are no duplicates in the desired dates list
- Both
desiredDates
andsourceData
must be in order - Because the method is generic if you are using an anonymous type then the compiler will automatically tell you if your 'default' item is not the same 'shape' as a regular item.
- Right now I include a check for duplicate items in
sourceData
but there is no such check indesiredDates
- If you want to ensure the lists are ordered by date you will need to add extra code
来源:https://stackoverflow.com/questions/1468637/filling-in-missing-dates-using-a-linq-group-by-date-query