问题
I am trying to remove duplicates with tolerance from a set of keys and values using the following rule:
Assume the following set:
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
Plotted this would look like this:
Now I would like to remove those pairs where the keys are very close together as indicated in the plot by the red circle. The key value pair that I would like to keep is that one with the largest value (in the example the middle one [3.1; 1.3]
), so that the resulting set would be:
keys = [1 2 3.1 4 5];
vals = [0.8 1 1.3 1 1.1];
I tried to use Matlab's diff
function to get this behavior by doing
vals_new = keys(~(diff(keys) < 0.5));
keys_new = vals(~(diff(keys) < 0.5));
[M,I] = max(vals(diff(keys) < 0.5));
This gives vals_new and keys_new as a new set that only includes the last of the duplicate pairs, but is also lacking the very last value:
keys_new = [1 2 3.15 4]
vals_new = [0.8 1 1.2 1]
The last line returns the index of the maximum value of the duplicate pairs I=2
, however does unfortunately not include the last of the three duplicate pairs [3.15; 1.2]
so it's more a coincidence that it is correct here.
I feel like there should be a much smarter way to do this, but can't really get my head around it.
回答1:
Here is my solution:
Step1. Find all the non-max point in the current keys&vals, which has a larger neighbor in front of it or just behind it, and build a Set called Nind
.
Step2. Create another Set called Cind
, which contains every point that has a close neighbor and needed to be considered in the current keys&vals.
Step3. Intersect Nind
and Cind
, and delete the same part in the Keys
and Vals
.
Step4. If the intersect of two set is empty, goto Step5. In the other cases, goto Step1.
Step5. This is the end~
Note that a while loop is dealing with some ugly input which has multiple max points, something like:
My code:
%% Input
clc; clear;
keys = [1 2 3 3.1 3.15 4 5];
vals = [0.8 1 1.1 1.3 1.2 1 1.1];
%% Dealing
ind=-1;
while(~isempty(ind))
%find the non-max point
Max=([diff(vals) 0]<0 & [0 -diff(vals)]<0);
Nind=1:length(vals);
Nind(Max)=[];
%determine the range of points
Cind=[0 diff(keys)<0.5];
Cind(find(Cind)-1)=1;
vec=1:length(Cind);
Cind=Cind.*vec;
Cind(Cind == 0)=[];
%check through & back
ind=intersect(Cind,Nind);
keys(ind)=[];
vals(ind)=[];
end
%% Output
[keys;vals]
the output of the code is:
ans =
1.0000 2.0000 3.1000 4.0000 5.0000
0.8000 1.0000 1.3000 1.0000 1.1000
来源:https://stackoverflow.com/questions/51280156/remove-duplicate-key-value-pairs-with-tolerance-by-keeping-the-ones-with-largest