Sorting in linear time and in place

寵の児 提交于 2019-12-18 11:28:39

问题


Suppose that n records have keys in the range from 1 to k.

  • Write an algorithm to sort the records in place in O(n+k) time.
  • You may use O(k) storage outside the input array.
  • Is your algorithm stable?

if we use counting sort to we can do it in O(n+k) time and is stable but its not in place.
if k=2 it can be done in place but its not stable (using two variables to maintain the indexes in the array for k=0 and k=1)
but for k>2 i couldnt think of any good algo


回答1:


First, let's rehash how counting sort works:

  • Count how often every key exists in the array to be sorted. These counts are written to an array of size k.
  • Compute the partial sums of the key counts. This gives the starting position for every bin of equal keys in the sorted array.
  • Move the items in the array to their final position incrementing the starting position of the corresponding bin for every item.

Now the question is how to perform the final step in-place. The standard approach for an in-place permutation is to select the first element and swap it with the element that takes its correct position. This step is repeated with the swapped element until we hit a element that belongs in the first position (a cycle has been completed). Then the whole procedure is repeated for the elements at the second, third, etc. position until the whole array has been processed.

The problem with counting sort is that the final positions are not readily available but are computed by incrementing every bin's starting position in the final loop. In order to never increment the starting position twice for an element, we have to find a way to determine if an element at a certain position has been moved there already. This can be done by keeping track of the original starting position for every bin. If an element lies between the original starting position and the position for the next element of a bin, it has been already touched.

Here's an implementation in C99 that runs in O(n+k) and requires only two arrays of size k as extra storage. The final permutation step is not stable.

#include <stdlib.h>

void in_place_counting_sort(int *a, int n, int k)
{
    int *start = (int *)calloc(k + 1, sizeof(int));
    int *end   = (int *)malloc(k * sizeof(int));

    // Count.
    for (int i = 0; i < n; ++i) {
        ++start[a[i]];
    }

    // Compute partial sums.
    for (int bin = 0, sum = 0; bin < k; ++bin) {
        int tmp = start[bin];
        start[bin] = sum;
        end[bin]   = sum;
        sum += tmp;
    }
    start[k] = n;

    // Move elements.
    for (int i = 0, cur_bin = 0; i < n; ++i) {
        while (i >= start[cur_bin+1]) { ++cur_bin; }
        if (i < end[cur_bin]) {
            // Element has already been processed.
            continue;
        }

        int bin = a[i];
        while (bin != cur_bin) {
            int j = end[bin]++;
            // Swap bin and a[j]
            int tmp = a[j];
            a[j] = bin;
            bin = tmp;
        }
        a[i] = bin;
        ++end[cur_bin];
    }

    free(start);
    free(end);
}

Edit: Here's another version using only a single array of size k based on Mohit Bhura's approach.

#include <stdlib.h>

void in_place_counting_sort(int *a, int n, int k)
{
    int *counts = (int *)calloc(k, sizeof(int));

    // Count.
    for (int i = 0; i < n; ++i) {
        ++counts[a[i]];
    }

    // Compute partial sums.
    for (int val = 0, sum = 0; val < k; ++val) {
        int tmp = counts[val];
        counts[val] = sum;
        sum += tmp;
    }

    // Move elements.
    for (int i = n - 1; i >= 0; --i) {
        int val = a[i];
        int j   = counts[val];

        if (j < i) {
            // Process a fresh cycle. Since the index 'i' moves
            // downward and the counts move upward, it is
            // guaranteed that a value is never moved twice.

            do {
                ++counts[val];

                // Swap val and a[j].
                int tmp = val;
                val  = a[j];
                a[j] = tmp;

                j = counts[val];
            } while (j < i);

            // Move final value into place.
            a[i] = val;
        }
    }

    free(counts);
}



回答2:


Here is my code that runs in O(n+k) time and uses only 1 extra array of size k ( apart from the main array of size n)

#include <stdio.h>
#include <string.h>

#include <stdlib.h>


int main(int argc, char const *argv[])
{
int n = atoi(argv[1]);
int k = atoi(argv[2]);

printf("%d\t%d",n,k);

int *a,*c;
int num,index,tmp,i;
a = (int*)malloc(n*sizeof(int));
c = (int*)calloc(k,sizeof(int));

srand(time(NULL));

for(i=0;i<n;i++)
{
    num =  (rand() % (k));
    a[i] = num;
    c[num]++;
}

printf("\n\nArray is : \n");
for(i=0;i<n;i++)
{
    printf("\t%d",a[i]);
    if(i%8==7)
        printf("\n");
}

printf("\n\nCount Array is : \n");
for(i=0;i<k;i++)
{
    printf("\t%d(%d)",c[i],i);
    if(i%8==7)
        printf("\n");
}

//Indexing count Array
c[0]--;
for(i=1;i<k;i++)
{
    c[i] = c[i-1] + c[i];       
}

printf("\n\nCount Array After Indexing is : \n");
for(i=0;i<k;i++)
{
    printf("\t%d(%d)",c[i],i);
    if(i%8==7)
        printf("\n");
} 

// Swapping Elements in Array
for(i=0;i<n;i++)
{
    index = c[a[i]];
    //printf("\na[%d] = %d, going to position %d",i,a[i],index);
    c[a[i]]--;
    if(index > i)
    {
        tmp = a[i];
        a[i] = a[index];
        a[index] = tmp;
        i--;
    }
}

printf("\n\n\tFinal Sorted Array is : \n\n");
for(i=0;i<n;i++)
{
    printf("\t%d",a[i]);
    if(i%8==7)
        printf("\n");
}

printf("\n\n");

return 0;
}

Even this algo is not stable. All elements are in their reverse order.

P.s : keys are in the range 0 to (k-1)




回答3:


An example for integer valued sequences. The sort is unstable. While it is not as concise as the answer provided by Mohit it is marginally faster (for the common case where k << n) by skipping elements already in their correct bins (time is asymptotically the same). In practice I prefer Mohit's sort for its tighter, simpler loop.

def sort_inplace(seq):
    min_ = min(seq)
    max_ = max(seq)
    k = max_ - min_ + 1
    stop = [0] * k
    for i in seq:
        stop[i - min_] += 1
    for j in range(1, k):
        stop[j] += stop[j - 1]
    insert = [0] + stop[:k - 1]
    for j in range(k):
        while insert[j] < stop[j] and seq[insert[j]] == j + min_:
            insert[j] += 1
    tmp = None
    for j in range(k):
        while insert[j] < stop[j]:
            tmp, seq[insert[j]] = seq[insert[j]], tmp
            while tmp is not None:
                bin_ = tmp - min_
                tmp, seq[insert[bin_]] = seq[insert[bin_]], tmp
                while insert[bin_] < stop[bin_] and seq[insert[bin_]] == bin_ + min_:
                    insert[bin_] += 1

With a tighter loop but still skipping already-relocated elements:

def dave_sort(seq):
    min_ = min(seq)
    max_ = max(seq)
    k = max_ - min_ + 1
    stop = [0] * k

    for i in seq:
        stop[i - min_] += 1

    for i in range(1, k):
        stop[i] += stop[i-1]
    insert = [0] + stop[:k - 1]

    for meh in range(0, k - 1):
        i = insert[meh]
        while i < stop[meh]:
            bin_ = seq[i] - min_
            if insert[bin_] > i:
                tmp = seq[insert[bin_]]
                seq[insert[bin_]] = seq[i]
                seq[i] = tmp
                insert[bin_] += 1
            else:
                i += 1

Edit: Mohit's approach in Python with extra bits to verify the effect on stability of the sort.

from collections import namedtuple
from random import randrange

KV = namedtuple("KV", "k v")

def mohit_sort(seq, key):
    f = lambda v: getattr(v, key)
    keys = map(f, seq)
    min_ = min(keys)
    max_ = max(keys)
    k = max_ - min_ + 1
    insert = [0] * k

    for i in keys:
        insert[i - min_] += 1

    insert[0] -= 1
    for i in range(1, k):
        insert[i] += insert[i-1]

    i = 0
    n = len(seq)
    while i < n:
        bin_ = f(seq[i])
        if insert[bin_] > i:
            seq[i], seq[insert[bin_]] = seq[insert[bin_]], seq[i]
            i -= 1
        insert[bin_] -= 1
        i += 1


def test(n, k):
    seq = []
    vals = [0] * k
    for _ in range(n):
        key = randrange(k)
        seq.append(KV(key, vals[key]))
        vals[key] += 1
    print(seq)
    mohit_sort(seq, "k")
    print(seq)


if __name__ == "__main__":
    test(20, 3)


来源:https://stackoverflow.com/questions/15682100/sorting-in-linear-time-and-in-place

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!