Most efficient way to order an array of Strings by frequency

孤人 提交于 2019-12-04 06:26:33

You can solve this in two steps:

  1. Create a counter object - a Map<String, Integer> listing for each string the number of times it appears in the input: in other words, it's a frequency map. This is O(n), as you only need to traverse the input once for building the map

  2. With the previous map, create a list with its keys, sorted using the frequency of items (the values in the map) as ordering criteria. This is O(n log n), and you can call Collections.sort(), with a Comparator that uses the string frequency for the comparisons

This is what I mean:

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

final Map<String, Integer> counter = new HashMap<String, Integer>();
for (String str : stringArray)
    counter.put(str, 1 + (counter.containsKey(str) ? counter.get(str) : 0));

List<String> list = new ArrayList<String>(counter.keySet());
Collections.sort(list, new Comparator<String>() {
    @Override
    public int compare(String x, String y) {
        return counter.get(y) - counter.get(x);
    }
});

After the above code executes, the variable list will contain the following values (the order between elements of the same frequency is unspecified):

[x, y, a, z]

It's trivial to convert the list to an array:

list.toArray(new String[list.size()])

And if you need to find out the frequency of each string, just iterate over the sorted keys:

for (String str : list) {
    int frequency = counter.get(str);
    System.out.print(str + ":" + frequency + ", ");
}

Use the HashMap<String,Integer> to maintain your counts. This will be the most efficient way to process the arbitrary list of strings.

Create an ArrayList<Map.Entry<String,Integer>> from the map's entrySet().

Sort this list using a Collections.sort() and a custom comparator.

Don't get hung up on micro-optimizations.

If third-party libraries are fair game, the following one-liner with Guava is asymptotically optimal:

Multisets.copyHighestCountFirst(ImmutableMultiset.copyOf(array))
   .elementSet().toArray(new String[0]);
String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

List<String> list = Arrays.asList(stringArray);
Collections.sort(list);

HashMap<String, Integer> map = new HashMap<String, Integer>();

for(int i = 0; i < list.size();) {

    String s = list.get(i); //get the string to count

    int count = list.lastIndexOf(s) - list.indexOf(s) + 1; //count it

    map.put(s, count); // add it

    i = list.lastIndexOf(s) + 1; // skip to the next string

}

I would consider this as an elegant solution but i don't know how performant that is. If you wnat it sorted use a TreeMap, but that is really slow.

You can sort it afterwards like this:

TreeMap<String, Integer> sortedMap = new TreeMap<String, Integer>(unsortedMap);

But note that having Integer as key is not working! Because a key is unique and if for example a and b appear one time, a will be kicked out!

Print result: 1)string with different occurrence sorted in desc order. 2)string with same occurrence sorted by char in asce order.

 public static void sortStringByOccurance(String[] stringArray) {
    // O(n)
    Map<String, Integer> map = new HashMap<>();
    for (String str : stringArray) {
        map.put(str, map.containsKey(str)? map.get(str)+1 : 1);
    }

    // O(n)
    TreeMap<Integer, TreeSet<String>> treemap = new TreeMap<>();
    for (String key : map.keySet()) {
        if (treemap.containsKey(map.get(key))) {
            treemap.get(map.get(key)).add(key);
        }
        else {
            TreeSet<String> set = new TreeSet<>();
            set.add(key);
            treemap.put(map.get(key), set);
        }
    }

    // O(n)
    Map<Integer, TreeSet<String>> result = treemap.descendingMap();
    for (int count : result.keySet()) {
        TreeSet<String> set = result.get(count);
        for (String word : set) {
            System.out.println(word + ":" + count);
        }
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!