问题
I use recursive merge sort for sorting a link list, but during the merge sort I would like to delete duplicates. Anyone has insight in how to accomplish this?
I am using C code.
回答1:
In merge sort you take two (or more) already-sorted lists repeatedly apply the following rules:
- find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
- remove that item from its list
- add it to your output list
To remove duplicates, you simply modify the rules very slightly:
- find the lesser/least of the items of the top of each of the input lists, choosing any of the lowest items if there is a tie
- remove that item from its list
- if it is the same as the last item you added to your output list, throw it away
- otherwise, add it to your output list
This will ensure that no two consecutive items on your output list are the same, and that the items in it are in order, which is what you were after.
回答2:
To use merge sort to remove duplicates, you would ignore elements that are repeated in the merging process.
回答3:
Consider the merge function within mergesort.
During the merge process, you are of course comparing elements with one another.
Convince yourself that, if you're merging 2 sorted lists A and B, and if both lists contain the same value x, then it will happen that the two identical elements will be compared with one another. If you want a proof, my approach would be to show that if there is a case where two identical elements are not compared, then one or both of the lists are in fact unsorted. Proof by contradiction, baby!
So you can easily detect cases whereby there are two identical elements in two lists being merged.
Next, convince yourself that if there are two identical elements in two lists not being merged just now, that eventually they will be merged together and the identical elements will be detected. That's basically a proof by induction --- if nothing else, clearly the very last merge (merging sorted lists of length n/2 and n/2 into the final list of length n) will bring those elements together.
Lastly, convince yourself that there cannot exist a singular list with two of the same element, if you recurse to the n = 1 or n = 0 case. This is again inductive-ish because of course any larger list will first have to survive the "filtering" process described in the first big paragraph.
If you are convinced of those three things, then it will be apparent that Steven's or Tim's solutions are quite suitable.
回答4:
Using C++ but you can just use arrays instead of vectors for C
#include <iostream>
#include <vector>
// merge 2 arrays using a temp array
void merge (std::vector<int>& v, std::vector<int>& tmpArray, int left, int center, int right )
{
int leftPos = left;
int leftEnd = center;
int tmpPos = leftPos;
int rightEnd = right;
int rightPos = center + 1;
// finger matching algo left and right
while ( leftPos <= leftEnd && rightPos <= rightEnd )
{
// this first if block here for equals is what does your duplicate removal magic
if ( v[leftPos] == v[rightPos] )
{
tmpArray[tmpPos++] = std::move(v[leftPos++]);
++rightPos;
}
else if ( v[leftPos] < v[rightPos] )
tmpArray[tmpPos++] = std::move(v[leftPos++]);
else
tmpArray[tmpPos++] = std::move(v[rightPos++]);
}
// copy rest of left
while ( leftPos <= leftEnd )
tmpArray[tmpPos++] = std::move(v[leftPos++]);
// copy rest of right
while ( rightPos <= rightEnd )
tmpArray[tmpPos++] = std::move(v[rightPos++]);
// copy tmp array back to array
int numElements = right - left + 1;
for ( int i = 0; i < numElements; ++i, --rightEnd)
v[rightEnd]=std::move(tmpArray[rightEnd]);
}
void mergeSort ( std::vector<int>& v, std::vector<int>& tmpArray, int left, int right )
{
if ( left < right )
{
auto center = left + (right - left)/2;
mergeSort(v, tmpArray, left, center);
mergeSort(v, tmpArray, center+1, right);
merge ( v, tmpArray, left, center, right );
}
}
void mergeSort (std::vector<int>& v)
{
int sz = v.size();
std::vector<int> tmpArray ( sz, 0 );
mergeSort (v, tmpArray, 0, sz-1);
}
int main ()
{
std::vector<int> v { 7,8,6,5,4,3,9,12,14,17,21,1,-2,-3,-3,-3,-9,10,11 };
mergeSort ( v );
for ( auto&i : v)
std::cout << i << " " ;
std::cout << std::endl;
}
回答5:
Updating my Original Answer below with some more generic code using Collection Iterators instead of just vectors.
// merge a sort collection
template<typename CollectionT>
void mergeCollection(CollectionT & collection, CollectionT & tmpCollection,
typename CollectionT::iterator first, typename CollectionT::iterator mid,
typename CollectionT::iterator last) {
using IteratorType = typename CollectionT::iterator;
IteratorType left = first;
IteratorType leftEnd = mid;
IteratorType temp = tmpCollection.begin();
auto const distance = std::distance(collection.begin(), first);
std::advance(temp, distance);
IteratorType right = mid;
IteratorType rightEnd = last;
// finger matching algo left and right
while (left != leftEnd && right != rightEnd) {
// this first if block here for equals is what does your duplicate removal magic
if (*left == *right) {
*temp++ = *left++;
*temp++ = *right++; // ++right for non-duplicate
}
else if (*left < *right) {
*temp++ = *left++;
}
else {
*temp++ = *right++;
}
}
// copy rest of left
while (left != leftEnd) {
*temp++ = *left++;
}
// copy rest of right
while (right != rightEnd) {
*temp++ = *right++;
}
collection = tmpCollection;
}
template<typename CollectionT>
void mergeSortCollection(CollectionT & collection, CollectionT & tmpCollection, typename CollectionT::iterator first, typename CollectionT::iterator last) {
auto const distance = std::distance(first, last);
if(distance > 1) {
// get mid iterator
auto mid = first;
std::advance(mid, distance / 2);
mergeSortCollection(collection, tmpCollection, first, mid);
mergeSortCollection(collection, tmpCollection, mid, last);
mergeCollection(collection, tmpCollection, first, mid, last);
}
}
template<typename CollectionT>
void mergeSortCollection(CollectionT & collection) {
CollectionT tmpCollection {collection};
mergeSortCollection(collection, tmpCollection, collection.begin(), collection.end());
}
}
some test code:
namespace {
template<typename It>
auto printCollection =
[](std::ostream& out, It const begin, It const end, std::string const & message = "") {
using ValueType = typename std::iterator_traits<It>::value_type;
out << message;
std::copy(begin, end, std::ostream_iterator<ValueType>(out, ", "));
out << std::endl;
};
}
TEST(Sort, MergeSortCollectionVector) {
std::vector<int32_t> before = { 83, 86, 77, 15, 93, 35, 86, 92, 49, 21 };
std::vector<int32_t> original = before;
std::vector<int32_t> after = { 15, 21, 35, 49, 77, 83, 86, 86, 92, 93 };
printCollection<decltype(before.begin())>(std::cout, before.begin(), before.end(), "BEFORE sort: ");
mergeSortCollection(before);
printCollection<decltype(before.begin())>(std::cout, before.begin(), before.end(), "AFTER sort: ");
EXPECT_EQ(after, before);
EXPECT_NE(original, before);
}
TEST(Sort, MergeSortCollectionList) {
std::list<int32_t> before = { 83, 86, 77, 15, 93, 35, 86, 92, 49, 21 };
std::list<int32_t> original = before;
std::list<int32_t> after = { 15, 21, 35, 49, 77, 83, 86, 86, 92, 93 };
printCollection<decltype(before.begin())>(std::cout, before.begin(), before.end(), "BEFORE sort: ");
mergeSortCollection(before);
printCollection<decltype(before.begin())>(std::cout, before.begin(), before.end(), "AFTER sort: ");
EXPECT_EQ(after, before);
EXPECT_NE(original, before);
}
As others pointed out, you will need some modification to the merge
process to fit your need. Below is the modified merge()
function for your reference (original is here)
function merge(left,right)
var list result
while length(left) > 0 and length(right) > 0
if first(left) < first(right) // <--- change from <= to <
append first(left) to result
left = rest(left)
else if first(left) > first(right)
append first(right) to result
right = rest(right)
else // <----- added case to remove duplicated items
append first(right) to result
left = rest(left)
right = rest(right)
end
end while
if length(left) > 0
append left to result
else
append right to result
return result
回答6:
Or just use any sorting and when it completes, scan over the sorted list and remove duplicated elements (they naturally will be next to each other)
来源:https://stackoverflow.com/questions/1738658/how-do-i-use-merge-sort-to-delete-duplicates