I have a data table I\'ve loaded from a CSV file. I need to determine which rows are duplicates based on two columns (product_id
and owner_org_id
) in t
You could use LINQ-To-DataSet and Enumerable.Except
/Intersect
:
var tbl1ID = tbl1.AsEnumerable()
.Select(r => new
{
product_id = r.Field("product_id"),
owner_org_id = r.Field("owner_org_id"),
});
var tbl2ID = tbl2.AsEnumerable()
.Select(r => new
{
product_id = r.Field("product_id"),
owner_org_id = r.Field("owner_org_id"),
});
var unique = tbl1ID.Except(tbl2ID);
var both = tbl1ID.Intersect(tbl2ID);
var tblUnique = (from uniqueRow in unique
join row in tbl1.AsEnumerable()
on uniqueRow equals new
{
product_id = row.Field("product_id"),
owner_org_id = row.Field("owner_org_id")
}
select row).CopyToDataTable();
var tblBoth = (from bothRow in both
join row in tbl1.AsEnumerable()
on bothRow equals new
{
product_id = row.Field("product_id"),
owner_org_id = row.Field("owner_org_id")
}
select row).CopyToDataTable();
Edit: Obviously i've misunderstood your requirement a little bit. So you only have one DataTable
and want to get all unique and all duplicate rows, that's even more straight-forward. You can use Enumerable.GroupBy with an anonymous type containing both fields:
var groups = tbl1.AsEnumerable()
.GroupBy(r => new
{
product_id = r.Field("product_id"),
owner_org_id = r.Field("owner_org_id")
});
var tblUniques = groups
.Where(grp => grp.Count() == 1)
.Select(grp => grp.Single())
.CopyToDataTable();
var tblDuplicates = groups
.Where(grp => grp.Count() > 1)
.SelectMany(grp => grp)
.CopyToDataTable();