Remove duplicate key value pairs with tolerance by keeping the ones with largest value

落花浮王杯 提交于 2020-01-15 09:30:51

问题


I am trying to remove duplicates with tolerance from a set of keys and values using the following rule:

Assume the following set:

keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];

Plotted this would look like this:

Now I would like to remove those pairs where the keys are very close together as indicated in the plot by the red circle. The key value pair that I would like to keep is that one with the largest value (in the example the middle one [3.1; 1.3]), so that the resulting set would be:

keys = [1 2 3.1 4 5];
vals = [0.8 1 1.3 1 1.1];

I tried to use Matlab's diff function to get this behavior by doing

vals_new = keys(~(diff(keys) < 0.5));
keys_new = vals(~(diff(keys) < 0.5));
[M,I] = max(vals(diff(keys) < 0.5));

This gives vals_new and keys_new as a new set that only includes the last of the duplicate pairs, but is also lacking the very last value:

keys_new = [1 2 3.15 4]
vals_new = [0.8 1 1.2 1]

The last line returns the index of the maximum value of the duplicate pairs I=2, however does unfortunately not include the last of the three duplicate pairs [3.15; 1.2] so it's more a coincidence that it is correct here.

I feel like there should be a much smarter way to do this, but can't really get my head around it.


回答1:


Here is my solution:

Step1. Find all the non-max point in the current keys&vals, which has a larger neighbor in front of it or just behind it, and build a Set called Nind.

Step2. Create another Set called Cind, which contains every point that has a close neighbor and needed to be considered in the current keys&vals.

Step3. Intersect Nind and Cind, and delete the same part in the Keys and Vals.

Step4. If the intersect of two set is empty, goto Step5. In the other cases, goto Step1.

Step5. This is the end~

Note that a while loop is dealing with some ugly input which has multiple max points, something like:

My code:

%% Input
clc; clear;
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];


%% Dealing
ind=-1;
while(~isempty(ind))
  %find the non-max point
  Max=([diff(vals) 0]<0 & [0 -diff(vals)]<0); 
  Nind=1:length(vals);
  Nind(Max)=[];

  %determine the range of points
  Cind=[0 diff(keys)<0.5];
  Cind(find(Cind)-1)=1;
  vec=1:length(Cind);
  Cind=Cind.*vec;
  Cind(Cind == 0)=[];

  %check through & back
  ind=intersect(Cind,Nind);
  keys(ind)=[];
  vals(ind)=[];
end

%% Output
[keys;vals]

the output of the code is:

ans =

    1.0000    2.0000    3.1000    4.0000    5.0000
    0.8000    1.0000    1.3000    1.0000    1.1000


来源:https://stackoverflow.com/questions/51280156/remove-duplicate-key-value-pairs-with-tolerance-by-keeping-the-ones-with-largest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!