sorting

Sorting big file (10G)

萝らか妹 提交于 2020-01-12 08:52:03
问题 I'm trying to sort a big table stored in a file. The format of the file is (ID, intValue) The data is sorted by ID , but what I need is to sort the data using the intValue , in descending order. For example ID | IntValue 1 | 3 2 | 24 3 | 44 4 | 2 to this table ID | IntValue 3 | 44 2 | 24 1 | 3 4 | 2 How can I use the Linux sort command to do the operation? Or do you recommend another way? 回答1: How can I use the Linux sort command to do the operation? Or do you recommend another way? As others

How to find collocations in text, python

让人想犯罪 __ 提交于 2020-01-12 08:37:08
问题 How do you find collocations in text? A collocation is a sequence of words that occurs together unusually often. python has built-in func bigrams that returns word pairs. >>> bigrams(['more', 'is', 'said', 'than', 'done']) [('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')] >>> What's left is to find bigrams that occur more often based on the frequency of individual words. Any ideas how to put it in the code? 回答1: Try NLTK. You will mostly be interested in nltk.collocations

A more elegant secondary sort for arrays [closed]

自古美人都是妖i 提交于 2020-01-12 08:15:33
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 4 years ago . I need to perform a sort on an array and if two elements are equal I then need to perform a secondary sort on a different key within those elements. Having a look at the Mozilla Developer Network docs for array.sort there is a nice snippet at code at the bottom to handle the

A more elegant secondary sort for arrays [closed]

烈酒焚心 提交于 2020-01-12 08:14:07
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 4 years ago . I need to perform a sort on an array and if two elements are equal I then need to perform a secondary sort on a different key within those elements. Having a look at the Mozilla Developer Network docs for array.sort there is a nice snippet at code at the bottom to handle the

Multi criteria sorting of a list of objects with Guava Ordering

喜欢而已 提交于 2020-01-12 07:01:16
问题 I have a class WHICH CANNOT implement comparable, but needs to be sorted based on 2 fields. How can I achieve this with Guava? Let's say the class is: class X { String stringValue; java.util.Date dateValue; } And I have a list of these: List<X> lotsOfX; I want to sort them based on the value field first and then based on dateValue descending within each 'group' of 'value' fields. What I have been doing so far is: List<X> sortedList = ImmutableList.copyOf(Ordering.natural().onResultOf

Which general purpose sorting algorithm does Swift use? It does not perform well on sorted data

孤街醉人 提交于 2020-01-12 07:01:13
问题 I have been picking and probing at Swift standard libraries sort() function for its Array type. To my surprise I have noticed it performs poorly on already-sorted data. Sorting an array of Int which is shuffled seems to be 5x faster than sorting that very same array when it is already sorted. Sorting an array of shuffled objects is about 4x faster than sorting the very same one already in sorted order (sorting object array vs Int array use different algorithms I am sure so I sorted both to

Java: How do I sort multiple ArrayList by their size?

天大地大妈咪最大 提交于 2020-01-12 04:44:45
问题 I have 9 different ArrayList and I want to have a list of the top 5. I'm thinking of sorting those ArrayLists by their sizes. Is it possible to do that? If so, how can I achieve that? After a few try i finally got it working, just want to share it with everyone. it will be better to get the size of the arraylist and add it to the big arraylist // creates an ArrayList that holds ArrayLists List allTheLists = new ArrayList(); allTheLists.add(pbaustraliaList.size()); allTheLists.add(pbotherList

strcmp for python or how to sort substrings efficiently (without copy) when building a suffix array

爱⌒轻易说出口 提交于 2020-01-12 03:21:47
问题 Here's a very simple way to build an suffix array from a string in python: def sort_offsets(a, b): return cmp(content[a:], content[b:]) content = "foobar baz foo" suffix_array.sort(cmp=sort_offsets) print suffix_array [6, 10, 4, 8, 3, 7, 11, 0, 13, 2, 12, 1, 5, 9] However, "content[a:]" makes a copy of content, which becomes very inefficient when content gets large. So i wonder if there's a way to compare the two substrings without having to copy them. I've tried to use the buffer-builtin,

How is counting sort a stable sort?

一曲冷凌霜 提交于 2020-01-12 03:16:07
问题 Suppose my input is ( a , b and c to distinguish between equal keys) 1 6a 8 3 6b 0 6c 4 My counting sort will save as (discarding the a , b and c info!!) 0(1) 1(1) 3(1) 4(1) 6(3) 8(1) which will give me the result 0 1 3 4 6 6 6 8 So, how is this stable sort? I am not sure how it is "maintaining the relative order of records with equal keys." Please explain. 回答1: Simple, really: instead of a simple counter for each 'bucket', it's a linked list. That is, instead of 0(1) 1(1) 3(1) 4(1) 6(3) 8(1)

Force items at beginning and end of list

半城伤御伤魂 提交于 2020-01-11 18:50:08
问题 How can I modify this list so that all p's appear at the beginning, the q's at the end, and the values in between are sorted alphabetically? l = ['f','g','p','a','p','c','b','q','z','n','d','t','q'] So I would like to have: ['p','p','a','b','c','d','f','g','n','t','z','q','q'] 回答1: You can use sorted with the following key : sorted(l, key = lambda s: (s!='p', s=='q', s)) ['p', 'p', 'a', 'b', 'c', 'd', 'f', 'g', 'n', 't', 'z', 'q', 'q'] Explanation To get a better idea of how this is working,