I have a large number of sets of numbers. Each set contains 10 numbers and I need to remove all sets that have 5 or more number (unordered) matches with any other set.
F
I don't think there's a nice and beautiful way to do it. Most other answers will have you make a comparison between most pairs x,y which would be O(N^2). You can do it faster.
Algorithm: keep an array of all 5-tuples. For each new split it into all possible 5-tuples, add to that array. At the end, sort and check for duplicates.
There are C(10, 5) = 10*9*8*7*6/120 = 9*4*7, roughly 250 subsets of length 5 of set of length 10. So you're keeping a table which is 10^3 times larger than your data but perform just O(250*N) operations. That should work practically and I suspect that;s the best theoretically as well.