Group by clause vs. Distinct() [closed]

问题

My ASP.NET custom list control gets its values from a database View. The method which retrieves data finally returns an object of type List<Triplet> as a DataSource for the control.

I figured out three possible implementations for that method which all seem to work fine and give the same results. Now I'm not sure which of them should be preferred.

The point is, that I need unique strings from the query in alphabetical order, and there are many duplicates in the db. So I could fetch them all and then perfom a Distinct() to get unique values...

public override object GetData()
{
    return 
    (
        from name in
        (
            from job in DBConnection.NewDataContext.vJobs
            where job.name != null 
            select job.name

        ).Distinct().OrderBy(s => s) 

        select new Triplet(name, name, ListType)

    ).ToList();
 }

...or I could use a group by clause and only select the keys:

public override object GetData()
{
    return 
    (
        from job in DBConnection.NewDataContext.vJobs 
        where job.name != null
        group job by job.name into names 
        orderby names.Key 
        select new Triplet(names.Key, names.Key, ListType)

    ).ToList();
}

I also came up with the following, which uses a special EqualityComparer for the Triplets. Actually it was my first approach, but I didn't really like it:

public override object GetData()
{
    return
    (
        from job in DBConnection.NewDataContext.vJobs 
        where job.name != null
        select new Triplet(job.name, job.name, ListType)

    ).ToList().Distinct(new TripletComparer()).OrderBy(t => (string)t.First).ToList();
}

I think the goup-by-solution leaves most of the work to the database (MS SQL Server), which might be an advantage or disadvantage.. I don't really know. Maybe the Distict()-solutution suffers from having to push too much unneccessary data from the db to my method?

Any ideas which one should be implemented? It seems I just can't see the forest because of too many trees...

回答1:

Until there is a need to concern yourself with performance (that is, do not micro-optimize) then you should probably opt for the most readable solution which is clearly to call Distinct as that conveys your intent very clearly.

If you are truly concerned about the performance, then I suggest you perform some concrete benchmarks using a profiler.

回答2:

Since there are many duplicates, applying the distinct in the database is the best.

Linq-2-sql use deferred loading. However, calling ToList() will have your query executed in the database, so everything after that is in memory.

Therefore the first one is probably the best.

来源：https://stackoverflow.com/questions/24869683/group-by-clause-vs-distinct

标签

.net

linq

distinct