Write a program to find 100 largest numbers out of an array of 1 billion numbers

前端 未结 30 2127
深忆病人
深忆病人 2020-11-29 14:04

I recently attended an interview where I was asked \"write a program to find 100 largest numbers out of an array of 1 billion numbers.\"

I was only able to give a br

相关标签:
30条回答
  • 2020-11-29 14:49

    If this is asked in an interview, I think the interviewer probably wants to see your problem solving process, not just your knowledge of algorithms.

    The description is quite general so maybe you can ask him the range or meaning of these numbers to make the problem clear. Doing this may impress an interviewer. If, for example, these numbers stands for people's age of within a country (e.g. China),then it's a much easier problem. With a reasonable assumption that nobody alive is older than 200, you can use an int array of size 200(maybe 201) to count the number of people with the same age in just one iteration. Here the index means the age. After this it's a piece of cake to find 100 largest number. By the way this algo is called counting sort.

    Anyway, making the question more specific and clearer is good for you in an interview.

    0 讨论(0)
  • 2020-11-29 14:49

    This question would be answered with N log(100) complexity (instead of N log N) with just one line of C++ code.

     std::vector<int> myvector = ...; // Define your 1 billion numbers. 
                                     // Assumed integer just for concreteness 
     std::partial_sort (myvector.begin(), myvector.begin()+100, myvector.end());
    

    The final answer would be a vector where the first 100 elements are guaranteed to be the 100 biggest numbers of you array while the remaining elements are unordered

    C++ STL (standard library) is quite handy for this kind of problems.

    Note: I am not saying that this is the optimal solution, but it would have saved your interview.

    0 讨论(0)
  • 2020-11-29 14:49
    Time ~ O(100 * N)
    Space ~ O(100 + N)
    
    1. Create an empty list of 100 empty slot

    2. For every number in input-list:

      • If the number is smaller than the first one, skip

      • Otherwise replace it with this number

      • Then, push the number through adjacent swap; until it's smaller than the next one

    3. Return the list


    Note: if the log(input-list.size) + c < 100, then the optimal way is to sort the input-list, then split first 100 items.

    0 讨论(0)
  • 2020-11-29 14:49

    Another O(n) algorithm -

    The algorithm finds the largest 100 by elimination

    consider all the million numbers in their binary representation. Start from the most significant bit. Finding if the MSB is 1 can be a done by a boolean operation multiplication with an appropriate number. If there are more than 100 1's in these million eliminate the other numbers with zeros. Now of the remaining numbers proceed with the next most significant bit. keep a count of the number of remaining numbers after elimination and proceed as long as this number is greater than 100.

    The major boolean operation can be an parallely done on GPUs

    0 讨论(0)
  • 2020-11-29 14:51

    I realized that this is tagged with 'algorithm', but will toss out some other options, since it probably should also be tagged 'interview'.

    What is the source of the 1 billion numbers? If it is a database then 'select value from table order by value desc limit 100' would do the job quite nicely - there might be dialect differences.

    Is this a one-off, or something that will be repeated? If repeated, how frequently? If it is a one-off and the data are in a file, then 'cat srcfile | sort (options as needed) | head -100' will have you quickly doing productive work that you are getting paid to do while the computer handles this trivial chore.

    If it is repeated, you would advise picking any decent approach to get the initial answer and store / cache the results so that you could continuously be able to report the top 100.

    Finally, there is this consideration. Are you looking for an entry level job and interviewing with a geeky manager or future co-worker? If so, then you can toss out all manner of approaches describing the relative technical pros and cons. If you are looking for a more managerial job, then approach it like a manager would, concerned with the development and maintenance costs of the solution, and say "thank you very much" and leave if that is the interviewer wants to focus on CS trivia. He and you would be unlikely to have much advancement potential there.

    Better luck on the next interview.

    0 讨论(0)
  • 2020-11-29 14:51

    Two options:

    (1) Heap (priorityQueue)

    Maintain a min-heap with size of 100. Traverse the array. Once the element is smaller than first element in heap, replace it.

    InSERT ELEMENT INTO HEAP: O(log100)
    compare the first element: O(1)
    There are n elements in the array, so the total would be O(nlog100), which is O(n)
    

    (2) Map-reduce model.

    This is very similar to word count example in hadoop. Map job: count every element's frequency or times appeared. Reduce: Get top K element.

    Usually, I would give the recruiter two answers. Give them whatever they like. Of course, map reduce coding would be labor-some because you have to know every exact parameters. No harm to practice it. Good Luck.

    0 讨论(0)
提交回复
热议问题