LinqToExcel: Distinct values in excel column

◇◆丶佛笑我妖孽 提交于 2019-12-08 20:49:35
Paul

You can use LinqToExcel to easily get the distinct values in a column

var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet()
                     select row["ColB"]).Distinct()

EDIT:

To use Distinct in LinqToExcel, you have to use a class that corresponds to the row data.

public class WorksheetRow
{
    public string ColA { get; set; }
    public string ColB { get; set; }
}

var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet<WorksheetRow>()
                     select row.ColB).Distinct()

LinqToExcel built-in distinct() supports single property. I use below to distinct on more than one columns:

  1. Move it to memory, .AsEnumerable().
  2. Use struct (c#), not class. struct is a value type, class is not.

public struct RowStruct  
{
    public string C1 {get; set;}
    public string C2 {get; set;}
    public int C3 {get; set;}
}

public class RowClass // class is NOT distinct friendly
{
    public string C1 {get; set;}
    public string C2 {get; set;}
    public int C3 {get; set;}
}

void Main()
{
    var excel = new ExcelQueryFactory(@"C:\Temp\a.xlsx");
    var qs = from c in excel.Worksheet<RowStruct>("Sheet1") select c;
    Console.WriteLine ("struct distinct is:{0}", 
         qs.AsEnumerable().Distinct().Count());

    var qc = from c in excel.Worksheet<RowClass>("Sheet1") select c;
    Console.WriteLine ("class distinct is:{0}", 
         qc.AsEnumerable().Distinct().Count());
}

My a.xlsx has duplicate data, here is my result:

struct distinct is:235
class distinct is:329

In Excel, select the column, go to.. Data > Remove Duplicates

This leaves you with the unique values.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!