C#: Compare two ArrayList of custom class and find duplicates

穿精又带淫゛_ 提交于 2019-12-11 05:07:25

问题


I have two arrays of ArrayList.

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;
}

ArrayList products1 = new ArrayList();
ArrayList products2 = new ArrayList();
ArrayList duplicateProducts = new ArrayList();

Now what I want is to get all the products (with all the fields of ProductDetails class) having duplicate description in both products1 and products2.

I can run two for/while loops as traditional way, but that would be very slow specially if I will be having over 10k elements in both arrays.

So probably something can be done with LINQ.


回答1:


If you want to use linQ, you need write your own EqualityComparer where you override both methods Equals and GetHashCode()

 public class ProductDetails
    { 
        public string id {get; set;}
        public string description {get; set;}
        public float rate {get; set;}
    }

public class ProductComparer : IEqualityComparer<ProductDetails>
{

    public bool Equals(ProductDetails x, ProductDetails y)
    {
        //Check whether the objects are the same object. 
        if (Object.ReferenceEquals(x, y)) return true;

        //Check whether the products' properties are equal. 
        return x != null && y != null && x.id.Equals(y.id) && x.description.Equals(y.description);
    }

    public int GetHashCode(ProductDetails obj)
    {
        //Get hash code for the description field if it is not null. 
        int hashProductDesc = obj.description == null ? 0 : obj.description.GetHashCode();

        //Get hash code for the idfield. 
        int hashProductId = obj.id.GetHashCode();

        //Calculate the hash code for the product. 
        return hashProductDesc ^ hashProductId ;
    }
}

Now, supposing you have this objects:

ProductDetails [] items1= { new ProductDetails { description= "aa", id= 9, rating=2.0f }, 
                       new ProductDetails { description= "b", id= 4, rating=2.0f} };

ProductDetails [] items= { new ProductDetails { description= "aa", id= 9, rating=1.0f }, 
                       new ProductDetails { description= "c", id= 12, rating=2.0f } };


IEnumerable<ProductDetails> duplicates =
    items1.Intersect(items2, new ProductComparer());



回答2:


Consider overriding the System.Object.Equals method.

   public class ProductDetails
   {
     public string id;
     public string description;
     public float rate;

     public override bool Equals(object obj)
     {
       if(obj is ProductDetails == null)
          return false;

      if(ReferenceEquals(obj,this))
          return true;

       ProductDetails p = (ProductDetails)obj;
       return description == p.description;
    }
  }

Filtering would then be as simple as:

var result = products1.Where(product=>products2.Contains(product));

EDIT:

Do consider that this implementation is not optimal..

Moreover- it has been proposed in the comments to your question that you use a data base.
This way performance will be optimized - as per the database implementation
In any case- the overhead will not be yours.

However, you can optimize this code by using a Dictionary or a HashSet:
Overload the System.Object.GetHashCode method:

public override int GetHashCode()
{
  return description.GetHashCode();
}

You can now do this:

var hashSet = new HashSet<ProductDetails>(products1);
var result = products2.Where(product=>hashSet.Contains(product));

Which will boost your performance to an extent since lookup will be less costly.




回答3:


10k elements is nothing, however make sure you use proper collection types. ArrayList is long deprecated, use List<ProductDetails>.

Next step is implementing proper Equals and GetHashCode overrides for your class. The assumption here is that description is the key since that's what you care about from a duplication point of view:

public class ProductDetails
{
    public string id;
    public string description;
    public float rate;

    public override bool Equals(object obj)
    {
        var p = obj as ProductDetails;
        return ReferenceEquals(p, null) ? false : description == obj.description;
    }

    public override int GetHashCode() => description.GetHashCode();    
}

Now we have options. One easy and efficient way of doing this is using a hash set:

var set = new HashSet<ProductDetails>();
var products1 = new List<ProductDetails>();  // fill it
var products2 = new List<ProductDetails>();  // fill it

// shove everything in the first list in the set
foreach(var item in products1)
    set.Add(item);

// and simply test the elements in the second set
foreach(var item in products2)
    if(set.Contains(item))
    {
        // item.description was already used in products1, handle it here
    }

This gives you linear (O(n)) time-complexity, best you can get.



来源:https://stackoverflow.com/questions/39457194/c-compare-two-arraylist-of-custom-class-and-find-duplicates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!