Merge n lists and sort data maintaining original order/constraint

问题

I encountered the below problem in a coding competition. I tried a lot but a private test case was always failing for me with wrong answer and I am unable to figure out why my below approach would fail. I've no naive solution to generate stress test cases and compare. Also, there will be no editorial published. So, I am looking for someone to point out the flaw in my approach if possible.

Following is a detailed description of the problem and what I've tried till now.

Problem: There are multiple regions and you are given marks for students for each region as per their rank in the respective region. For example:

Region1:
StudentName, Score
A,           50.0
B,           60.0  
C,           40.0 

Region2:
StudentName, Score
D,           30.0
E,           10.0
F,           20.0

In the above data, Student A has rank 1 in Region1, B has Rank2 in Region1 and C has Rank3 in Region1. Similarly, D has Rank1 in Region2, E has Rank2 and F has Rank3.

A student could have lesser score and still have a greater rank within the same region. For example, A has better rank than B in Region1 i.e. we should assume that each region's data is already sorted according to ranks.

Our task is to merge all region's data and create a global data ranked according to score. The constraint is that any student who was at a lower rank in the region still cannot have a rank above the other students from it's region having a better rank than it in the original region data.

For example:

Region1:
A, 50.0
B, 60.0
C, 40.0 

Region2:
D, 30.0
E, 10.0
F, 20.0

will be merged into:

A, 50.0
B, 60.0
C, 40.0 
D, 30.0
E, 10.0
F, 20.0

The order did not change according to score because B will always be lower than A and F will always be lower than E as per their region's constraints.

Other test cases:

Region1:
A, 50.0
B, 60.0
C, 70.0 

Region2:
D, 30.0
E, 20.0
F, 10.0

again results in the order of A,B,C,D,E,F

Region1:
A, 60.0
B, 80.0
C, 100.0 

Region2:
D, 70.0
E, 90.0
F, 110.0

will result in: D, E, F, A, B, C

But,

Region1:
A, 11.5
B, 8.5
C, 10.0 

Region2:
D, 12.0
E, 9.0
F, 9.5

will result in:

D, 12.0
A, 11.5
E, 9.0 
B, 8.5
C, 10.0
F, 9.5

Constraints:
1<=number of regions<=6
score can be upto 7 decimal places

My approach is to add all the input data to one list and maintain a stable sort i.e. if the zone is same for two students, compare their rank in the zone, otherwise compare scores.

static class Student implements Comparable<Student>
{
String name;
double score;
int zone;
int rank;

//constructor

public int compareTo(Student o)
{
if(this.zone == o.zone)
{
//lower i.e. better rank
return Integer.compare(this.rank, o.rank);
}
//higher i.e. better score
return Double.compare(o.score, this.score);
}

}

main()
{
//read data from console input into an ArrayList<Student> students
Collections.sort(students);
//print each student from students

}

The question does not mention if score could be equal for two students in different zones. I've tried breaking the tie in that case using their respective ranks in the zone but the private test cases keep failing. I initially thought that the question might have some missing information but I see many successful submissions for this question in the competition dashboard. This is the reason I believe I am missing something and the question is not as simple as I am thinking. But, I've not been able to come up with a test case to validate this assumption.

Thanks!

回答1:

As I understand the question, the requirements are to sort the students according to score, but with the additional constraint that the relative ordering of students within a region be preserved.

Given the input data from one of the examples listed in the question,

Region1:
A, 11.5
B, 8.5
C, 10.0 

Region2:
D, 12.0
E, 9.0
F, 9.5

sorting only by score gives the following result: DACFEB.

However, the constraint about preserving relative ordering within a region requires the following partial orderings A < B < C and D < E < F.

The OP gives the solution to this particular example as DAEBCF. In comments on the question, I suggested two other possible solutions for this example: DABCEF and DAEFBC. I don't see any criteria which let us decide which of these possible solutions is the correct one. As such, the problem is underconstrained. One can argue about which of these solutions is preferable to the others, but doing so will introduce new constraints which aren't in evidence in the original question.

Given that there are multiple solutions that meet all the criteria in the problem, it means that there is no total ordering of values in this domain. Further, given that a correct Comparator must impose a total ordering on the values of its domain, it follows that it is not possible to write a proper Comparator for this domain.

Of course, it's possible to write a correct Comparator that has some behavior, and that will prefer one of these possible solutions over the others. Doing so will implicitly be implementing additional constraints that aren't part of the problem statement. In fact, it appears that Vincent van der Weele has done so. The statement "The next empty spot must be filled by the highest ranked remaining element of one of the regions. Which one? The one with the highest score" introduces the additional constraint. It results in the ordering DAEBCF, which was suggested by the OP. While this is sensible, but it's necessarily the "right" ordering.

An alternative algorithm might be as follows. 1) Start with an empty result list and maintain lists of students from each region in rank order. 2) Find the remaining student with the highest score. 3) Take that student, and higher-ranked students within the same region, and append them to the result, preserving relative order. 4) Repeat until no students remain.

Applying this algorithm to the example input results in DABCEF. This is sensible, but in a different way. Again, we don't know whether it's the "right" answer.

Either the problem in the programming competition was ill-specified to begin with, or some information was lost between the competition's problem statement and the OP's question here on Stack Overflow.

回答2:

Your comparitor is not correct. You basically say that students are sorted by rank if they are from the same region and sorted by score otherwise. But this is not true, as this example shows:

Region1:
A, 11.5
B, 8.5
C, 10.0 

Region2:
D, 12.0
E, 9.0
F, 9.5

results in

D, 12.0
A, 11.5
E, 9.0 
B, 8.5
C, 10.0
F, 9.5

i.e., E with score 9.0 comes before C with score 10.0, even though they are from different regions.

A simpler algorithm which does work:

Fill the result element by element. The next empty spot must be filled by the highest ranked remaining element of one of the regions. Which one? The one with the highest score. So remove that element from its region, add it to the result, and repeat until you are done.

来源：https://stackoverflow.com/questions/54855025/merge-n-lists-and-sort-data-maintaining-original-order-constraint

标签

java

algorithm

sorting

merge

comparator