Get index of closest value with binary search

问题

I want to do a binary search in python:

def binarySearch(data, val):

Where data is a sorted array and value is the value being searched for. If the value is found, I want to return the index (such that data[index] = val). If the value is not found, I want to return the index of the item that is closest to that value.

Here is what I've got:

def binarySearch(data, val):
    high = len(data)-1
    low = 0
    while True:
        index = (high + low) / 2
        if data[index] == val:
            return index
        if data[index] < val:
            low = index
        if data[index] > val:
            high = index

回答1:

Something like this should work. It returns an array with two indexes. If val is found, both values in the return array are the same. Otherwise, it returns the indexes of the two items closest to val.

def binarySearch(data, val):
    highIndex = len(data)-1
    lowIndex = 0
    while highIndex > lowIndex:
            index = (highIndex + lowIndex) / 2
            sub = data[index]
            if data[lowIndex] == val:
                    return [lowIndex, lowIndex]
            elif sub == val:
                    return [index, index]
            elif data[highIndex] == val:
                    return [highIndex, highIndex]
            elif sub > val:
                    if highIndex == index:
                            return sorted([highIndex, lowIndex])
                    highIndex = index
            else:
                    if lowIndex == index:
                            return sorted([highIndex, lowIndex])
                    lowIndex = index
    return sorted([highIndex, lowIndex])

回答2:

Here is the code that will return the index if the value is found, otherwise the index of the item that is closest to that value, hope it helps.

def binarySearch(data, val):
    lo, hi = 0, len(data) - 1
    best_ind = lo
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        if data[mid] < val:
            lo = mid + 1
        elif data[mid] > val:
            hi = mid - 1
        else:
            best_ind = mid
            break
        # check if data[mid] is closer to val than data[best_ind] 
        if abs(data[mid] - val) < abs(data[best_ind] - val):
            best_ind = mid
    return best_ind

def main():
    data = [1, 2, 3, 4, 5, 6, 7]
    val = 6.1
    ind = binarySearch(data, val)
    print 'data[%d]=%d' % (ind, data[ind])

if __name__ == '__main__':
    main()

回答3:

I know this is an old question, but it's high on Google's results and I had the same issue. There's a built-in to do this which uses binary search and allows you to feed in a reference array and a comparison array.

numpy.searchsorted(a, v, side='left', sorter=None)

a is the reference array (data in original question), v is the array to compare (val from the question). This returns an array of size v with int values for the index the nth element of v would need to be inserted into a to preserve the sort order in a' The side keyword determines whether you want the elements of v to be placed to the 'left' (before) or the 'right' (after) the appropriate value in a.

[documentation link as of July 2017] https://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html#numpy.searchsorted

回答4:

Here's a sample implementation of binary search. I won't do all the (home?)work for you, I am sure you can figure out how to store and return the index of the closest value yourself.

# BINARY SEARCH: O(log n), search space halfed each step
def biSearch(lst, find): # expects sorted lst 
    lowIndex = 0
    highIndex = len(lst) - 1
    midIndex = (lowIndex + highIndex)//2
    lastMid = None
    steps = 0
    while midIndex != lastMid:
        steps += 1
        if lst[midIndex] == find:
            return (midIndex, steps)
        if lst[midIndex] < find:
            lowIndex = midIndex + 1
        else:
            highIndex = midIndex - 1
        lastMid = midIndex    
        midIndex = (lowIndex + highIndex)//2
    return (-1, steps)

回答5:

Not the answer to this question. But I landed here trying to figure out how to get the two surrounding values for a given target item in a sorted list.

If anyone else is looking, this is what I came up with based on some of the other answers here.

import random


def get_nearest(items, target):
    print(f'looking for {target}')
    high_index = len(items) - 1
    low_index = 0

    if not items[low_index] <= target <= items[high_index]:
        raise ValueError(f'The target {target} is not in the range of'
                         f' provided items {items[low_index]}:{items[high_index]}')

    if target in items:
        return target, target

    while high_index > low_index:
        index = int((high_index + low_index) / 2)
        sub = items[index]

        if sub > target:
            if high_index == index:
                return tuple(sorted([items[high_index], items[low_index]]))
            high_index = index
        else:
            if low_index == index:
                return tuple(sorted([items[high_index], items[low_index]]))
            low_index = index
    return tuple(sorted([items[high_index], items[low_index]]))


if __name__ == '__main__':
    my_randoms = sorted(random.sample(range(10000000), 100000))
    x = 340000
    print(get_nearest(my_randoms, x))

    x = 0
    my_randoms = [x] + my_randoms
    print(get_nearest(my_randoms, x))

    x = 10000000
    my_randoms.append(x)
    print(get_nearest(my_randoms, x))

    idx = random.randint(0, 100000)
    x = my_randoms[idx]
    print(get_nearest(my_randoms, x))

来源：https://stackoverflow.com/questions/23681948/get-index-of-closest-value-with-binary-search

标签

python

binary-search