how to find complete sorting of elements by frequency?

二次信任 提交于 2021-01-29 05:20:59

问题


Here is the problem:

Given an array of integers, sort the array according to frequency of elements. For example, if the input array is {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12}, then modify the array to {3, 3, 3, 3, 2, 2, 2, 12, 12, 4, 5}. if 2 numbers have same frequency then print the one which came 1st.

I know how to do it partially. Here is my approcach.

I will create a struct which will be like:

typedef struct node
{
  int index; // for storing the position of the number in the array.
  int count; // for storing the number of times the number appears
  int value; // for storing the actual value
} a[50];

I will create an array of these structs, I will then sort it by a sorting algorithm on the basis of their count. However, how can I ensure that if the frequency of two elements are same, then that number should appear which has a lesser index value?


回答1:


#include <stdlib.h> // qsort, malloc, free
#include <stddef.h> // size_t
#include <stdio.h>  // printf

struct number
{
    const int * value;
    int         num_occurrences;
};

static void cmp_by_val(const struct number * a, const struct number * b)
{
    if (*a->value < *b->value)
        return -1;
    else if (*b->value < *a->value)
        return 1;
    else
        return 0;
}

static void cmp_by_occurrence_stable(const struct number * a, const struct number * b)
{
    if (a->num_occurrences < b->num_occurrences)
        return -1;
    else if (b->num_occurrences < a->num_occurrences)
        return 1;
    else if (a->value < b->value)
        return -1;
    else if (b->value < a->value)
        return 1;
    else
        return 0;
}

static struct number * sort_by_occurrence(const int * arr, size_t N)
{
    //
    // STEP 1: Convert the input
    //
    struct number * sort_arr = (struct number *)malloc(N * sizeof(struct number));
    if (! sort_arr) return NULL;
    for (int k = 0; k < N; ++k)
    {
        sort_arr[k].value = &arr[k];
        sort_arr[k].num_occurrences = 0;
    }
    //
    // STEP 2: Sort the input based on value
    //
    qsort(sort_arr, N, sizeof(struct number), cmp_by_val);
    //
    // STEP 3: Count occurrences
    //
    if (0 < N)
    {
        int cur_value = *sort_arr[0].value;
        int i = 0;
        for (j = 1; j < N; ++j)
        {
            if (*sort_arr[j].value != *sort_arr[i].value)
            {
                for (int k = i; k < j; ++k)
                    sort_arr[k].num_occurrences = j - i;
                i = j;
            }
        }
        for (; i < N; ++i)
            sort_arr[i].num_occurrences = N - i;
    }
    //
    // STEP 4: Sort based on occurrence count
    //
    qsort(sort_arr, N, sizeof(struct number), cmp_by_occurrence_stable);
    //
    // DONE
    //
    return sort_arr;
}

static void print_arr(const struct number * arr, size_t N)
{
    if (0 < N)
    {
        printf("%d", arr[0]->value);
        for (int k = 1; k < N; ++k)
            printf(", %d", arr[k]->value);
    }
    printf("\n");
}

int main(int argc, char ** argv)
{
    const int EXAMPLE_INPUT[11] = { 2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12 }; 
    struct number * sort_arr = sort_by_occurrence(EXAMPLE_INPUT, 11);
    if (sort_arr)
    {
        print_arr(sort_arr, 11);
        free(sort_arr);
    }
};



回答2:


You could create an array which stores the frequency of your input array (i.e. frequency[i] is the frequency of the input[i] element). After that it is easy to order the frequency array (using an stable algorithm) and make the same changes (swaps?) to the input array.

For creating the frequency array you can use several approaches, a simple and inefficient one is just counting each element with two nested loops. I left more efficient alternatives to your imagination.

Note: the frequency array has the same function as the count field in your struct node, but in a separated memory. If you will not need the frequencies in the future, I recommend you to use the separated memory, as you can release it.




回答3:


It seems that the problem is using unstable sort algorithm on the frequency of array elements.

  1. Do a qsort on the array based on freq
  2. Again do a qsort on the resulted array based on the indexes of the element with the same freq only.

    • This should give you a correct answer in O(nLog)

I minimized the code. The obvious parts are left out.

struct node
{
    int *val;
    int freq;
    // int index; <- we can do this by comparing &a->val with &b->val
};

int compare_byfreq(const int* a, const int* b)
{
    return a->freq - b->freq;
}
int compare_index(const int* a, const int* b)
{
    if( a->freq == b->freq)
    {
        return  a->val - b->val; //this can never be zero
    }
    //else we have different freq don't move elem
    return 0;
}

int main()
{
    int arr[] = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
    node *narray = (struct node*)malloc(sizeof(arr) * sizeof(node));

    // build the nodes-array
    for(int i =0; i < sizeof(arr); i++)
    {
        /* buid narray here, make sure you store the pointer to val and not the actual values */
    }

    qsort(narray, sizeof(arr), compare_byfreq);
    qsort(narray, sizeof(arr), compare_index);

    /*print narray*/

    return 0;
}

Edit: @0xbe5077ed got an interesting idea. Instead of comparing indexes compare addresses of your values! - I just re-edited the code for that




回答4:


I was trying to learn Java nowadays, realized that this could be a good exercise. Tried and solved this problem over there in Eclipse. Java is horrible, I went back to C to solve it, here's a solution that I'll explain right after showing it:

#include <stdio.h>
#include <malloc.h>

typedef struct numbergroup {
    int firstencounteridx;
    int count;
    int thenumber;
} Numbergroup;

int firstoneissuperior( Numbergroup gr1, Numbergroup gr2 ) {
    return gr1.count > gr2.count ||   // don't mind the line-break, it's just to fit
    ( gr1.count == gr2.count && gr1.firstencounteridx < gr2.firstencounteridx );
}

void sortgroups( Numbergroup groups[], int amount ) {
    for ( int i = 1; i < amount; i++ ) {
        for ( int j = 0; j < amount - i; j++ ) {
            if ( firstoneissuperior( groups[j + 1], groups[j] ) ) {
                Numbergroup temp = groups[j + 1];
                groups[j + 1] = groups[j];
                groups[j] = temp;
            }
        }
    }
}

int main( ) {
    int input[] = { 2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12 };
    Numbergroup * groups = NULL;
    int amountofgroups = 0;

    for ( int i = 0; i < ( sizeof input / sizeof * input ); i++ ) {
        int uniqueencounter = 1;
        for ( int j = 0; j < amountofgroups; j++ ) {
            if ( groups[j].thenumber == input[i] ) {
                uniqueencounter = 0;
                groups[j].count++;
                break;
            }
        }
        if ( uniqueencounter ) {
            groups = realloc( groups, ( amountofgroups + 1 ) * sizeof * groups );
            groups[amountofgroups].firstencounteridx = i;
            groups[amountofgroups].count = 1;
            groups[amountofgroups].thenumber = input[i];
            amountofgroups++;
        }
    }

    sortgroups( groups, amountofgroups );

    for ( int i = 0; i < amountofgroups; i++ )
        for ( int j = 0; j < groups[i].count; j++ )
            printf( "%d ", groups[i].thenumber );

    free( groups );

    putchar( 10 );
    return 0;
}

Let me explain the structure first, as well as its functionality: It is for each unique number. In your example, it is for 2s, 3s, 4s, 5s and the 12s, one for each, 5 in total. Each one is to store:

  • the index of the first encounter of that number
  • the amount of encounter of that number
  • the value of that number

For example, for 12s, it shall store:

  • firstencounteridx as 5, that is the index of the first 12
  • count as 2
  • thenumber as 12

The first loop generally does that. It expands the group of Numbergroups whenever a unique number is encountered, stores its index as well; increases the count in case a number that already has a group has been encountered.

Then a sort is issued, which simply is a bubble sort. Might be different than the conventional one, I don't have any memorized.

Sorting criteria function simply checks if the count field of the first group is greater than the other; otherwise it checks whether they are the same and the firstencounter of the first group is earlier than the other; in which cases it returns 1 as true. Those are the only possible ways for the first group to be considered superior than the second one.

That's one method, there can be others. This is just a suggestion, I hope it helps you, not just for this case, but in general.




回答5:


Created a map and sort the map by value. O(nlogn) time, and O(n) space.

import java.util.*;

public class SortByFrequency {
    static void sortByFreq( int[] A ) {

        // 1. create map<number, its count>
        Map<Integer, Integer> map = new HashMap<>();

        for(int i = 0; i < A.length; i++) {
            int key = A[i];

            if( map.containsKey(key) ) {
                Integer count = map.get(key);
                count++;
                map.put(key, count);
            }
            else {
                map.put(key, 1);
            }
        }

        // 2. sort map by value in desc. order 
        // used modified (for desc. order) MapUtil in http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java
        Map<Integer, Integer> map2= MapUtil.sortByValue(map);


        for(Map.Entry<Integer, Integer> entry : map2.entrySet() ) {
            int num = entry.getKey();
            int count = entry.getValue();

            for(int i = 0; i < count; i++ ) {
                System.out.print( num + " ");
            }
        }
        System.out.println();
    }

    public static void main(String[] args ) {
        int[] A1 = {2, 3, 2, 4, 5, 12, 2, 3, 3, 3, 12};
        sortByFreq(A1);
    }
}


来源:https://stackoverflow.com/questions/24251349/how-to-find-complete-sorting-of-elements-by-frequency

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!