How to remove duplicate items from a queue within a time frame?

我怕爱的太早我们不能终老 提交于 2019-12-11 17:48:12

问题


I would like to remove duplicate entries from a queue in an efficient way. The queue has a custom class with DateTime and FullPath and a few other things

private Queue<MyCustomClass> SharedQueue;

The DateTime in the class is the timestamp when inserted into the queue. The logic I would like to use is as following: Remove duplicates from the queue if the FullPath is identical within a 4 second window (i.e. if added to queue within 4 seconds of a duplicate fullpath). I have the events that I want to watch but a few duplicates will still arrive and that is OK.

I am using c# 2.0 and the FileSystemWatcher class and a worker queue.

There are a bunch of ways to do this: Trim the queue each time an item is added to it, or when I am working on the queue skip the processing of the current duplicate item.

Or should I use a 'global private' variable Dictionary< String, DateTime> ? So I can quickly search it? or a local copy of the queue ? Perhaps it is best to limit the local queue to 100 items in case of many file events? Though in my case it 'should be' only a relatively few files to monitor in a folder... but things always change...

Thanks for any help.

:Edit: Feb 10 8:54 EST: So I decided to implement a good simple solution as far as I can tell. I don't think I am holding on to the Dict keys too long...

:Edit: Feb 10 9:53 EST: Updated as my Dictionary cannot contain duplicate values.

   public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
  // lock shared queue
  lock (SharedQueue)
  {
    if (!IsDuplicateQueueInput(rcd))  // only add unique values to queue
    {
      SharedQueue.Enqueue(rcd);
      SomethingToDo.Set();
    }
  }
} // public void QueueInput

private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
 * Pseudo Code:
 * 
 * isDuplicate = false
 * Lock Dictionary
 * -If lastTimeStamp > 4 seconds ago then       // Optimization: save lastTimeStamp
 *    if Dict.Count > 0 then clear Dictionary
 *    return isDuplicate
 * -If not Dict.TryGetValue(sPath, dtTimeStamp) then
 *    Dict.AddKey()
 * -Else
 *    Compare key timestamp to Currenttime
 *    if key timestamp is <= 4 seconds ago then
 *       IsDuplicate = True
 *
 *    Dict.RemoveKey()
 *    Dict.AddKey()
 * 
 * return isDuplicate
*/
{
  // put real code here
}

回答1:


I just thought about using any collection similar to a generic hashtable... Something like this:

Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();

/// just let's assume you want to add/check for "c:\demo.txt"

if (!dict.ContainsKey(@"c:\demo.txt"))
{
   /// add items to dict by passing fullPath as key and your objects as value
   dict.add(@"c:\demo.txt", obj1);
} 
else if (dict[@"c:\demo.txt"].CheckForIntervall())
{
   /// replace current object in dictionary with new object - in case you want to..
   /// or just do what you want to 
}

edit - your custom class may have some functionality like this:

class YOURCUSTOMCLASS
{
    private DateTime creationTime;

    public DateTime CreationTime
    { get { return creationTime; } }

    public YOURCUSTOMCLASS(parametersGoesHere xyz)
    {
          creationTime = DateTime.Now;
    }

    /// in this case this method will return true
    /// if the timeSpan between this object and otherObject
    /// is greater than 4 seconds
    public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
    {
         TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);

         /// you may replace 4 through any other digit, or even better take
         /// a const/global var/static ...
         return diff.TotalSeconds > 4;
    }

    /// all the other stuff you need ...
}

Of course you will loose the functionality of a queue - but you will get an massive increase in runtime if your queue containts many elements.

hth




回答2:


I would make a subclass:

class MyDeduplicatedQueue : Queue<MyCustomObject> {
    /// etc
}

Then you can put all the appropriate filtering logic into the Enqueue method.




回答3:


I would make a wrapper class and not extend from queue, as users of the base type Queue expect different behavior. (Data Contracts in .NET 4.0 might even complain when you do so.)

Internally you can have a actual queue to which to redirect the required calls. Every Queue() call you could add the new element to a Dictionary when it is not contained already. Before doing so, you could empty all elements that are older than x seconds from this dictionary, and add them to the inner queue in order.

When dequeuing, you will have to check whether the inner queue contains elements, and otherwise pick the earliest element from the dictionary.

This ofcourse is just one possible implementation. When a lot of different elements might get queued quickly, the dictionary will fill up quickly and additional logic might have to be added to resolve that.




回答4:


Why not just reject inserts if they have duplicate paths? All you have to do is a linear search starting from the tail of the queue and stop when you either find a duplicate (and reject the insert) or when the timestamp exceeds your time limit (and insert the record)? Seems a lot simpler than keeping another data structure around and all the associated logic.



来源:https://stackoverflow.com/questions/4947869/how-to-remove-duplicate-items-from-a-queue-within-a-time-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!