How to implement LSH by MapReduce?

此生再无相见时 提交于 2020-01-03 03:31:09

问题


Suppose we wish to implement Local Sensitive Hashing(LSH) by MapReduce. Specifically, assume chunks of the signature matrix consist of columns, and elements are key-value pairs where the key is the column number and the value is the signature itself (i.e., a vector of values).

(a) Show how to produce the buckets for all the bands as output of a single MapReduce process. Hint: Remember that a Map function can produce several key-value pairs from a single element.

(b) Show how another MapReduce process can convert the output of (a) to a list of pairs that need to be compared. Specifically, for each column i, there should be a list of those columns j > i with which i needs to be compared.


回答1:


(a)

  • Map: the elements and its signature as input, produce the key-value pairs (bucket_id, element)
  • Reduce: produce the buckets for all the bands as output, i.e. (bucket_id, list(elements))

map(key, value: element):
    split item to bands
    for band in bands:
        for sig in band:
            key = hash(sig) // key = bucket id
        collect(key, value)

reduce(key, values):
    collect(key, values)

(b)

  • Map: output of (a) as input, produce the list of combination in same bucket, i.e. (bucket_id, list(elements)) -> (bucket_id, combination(list(elements))), which combination() is any two elements chosen from same bucket.
  • Reduce: output the item pairs need to be compared, Specifically, for each column i, there should be a list of those columns j > i with which i needs to be compared.

map(key, value):
    for itemA, itemB in combinations(value)
        key = (itemA.id, itemB.id)
        collect(key, [itemA, itemB])

reduce(key, values):
    collect(key, values)


来源:https://stackoverflow.com/questions/29320943/how-to-implement-lsh-by-mapreduce

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!