Interview Question: Find Median From Mega Number Of Integers

后端 未结 9 1189
暖寄归人
暖寄归人 2020-12-12 14:02

There is a file that contains 10G(1000000000) number of integers, please find the Median of these integers. you are given 2G memory to do this. Can anyone come up with an re

9条回答
  •  我在风中等你
    2020-12-12 14:45

    Create an array of 8-byte longs that has 2^16 entries. Take your input numbers, shift off the bottom sixteen bits, and create a histogram.

    Now you count up in that histogram until you reach the bin that covers the midpoint of the values.

    Pass through again, ignoring all numbers that don't have that same set of top bits, and make a histogram of the bottom bits.

    Count up through that histogram until you reach the bin that covers the midpoint of the (entire list of) values.

    Now you know the median, in O(n) time and O(1) space (in practice, under 1 MB).

    Here's some sample Scala code that does this:

    def medianFinder(numbers: Iterable[Int]) = {
      def midArgMid(a: Array[Long], mid: Long) = {
        val cuml = a.scanLeft(0L)(_ + _).drop(1)
        cuml.zipWithIndex.dropWhile(_._1 < mid).head
      }
      val topHistogram = new Array[Long](65536)
      var count = 0L
      numbers.foreach(number => {
        count += 1
        topHistogram(number>>>16) += 1
      })
      val (topCount,topIndex) = midArgMid(topHistogram, (count+1)/2)
      val botHistogram = new Array[Long](65536)
      numbers.foreach(number => {
        if ((number>>>16) == topIndex) botHistogram(number & 0xFFFF) += 1
      })
      val (botCount,botIndex) =
        midArgMid(botHistogram, (count+1)/2 - (topCount-topHistogram(topIndex)))
      (topIndex<<16) + botIndex
    }
    

    and here it is working on a small set of input data:

    scala> medianFinder(List(1,123,12345,1234567,123456789))
    res18: Int = 12345
    

    If you have 64 bit integers stored, you can use the same strategy in 4 passes instead.

提交回复
热议问题