Fastest way to calculate Hamming Distance in C#

风格不统一 提交于 2019-12-11 04:13:05

问题


I have a large collection (n = 20,000,000) of BigInteger, representing bit arrays of length 225. Given a single BigInteger, I want to find the x BigInteger within my collection below a certain Hamming distance.

Currently, I convert all BigInteger to byte arrays:

bHashes = new byte[hashes.Length][];
for (int i = 0; i < hashes.Length; i++)
{
    bHashes[i] = hashes[i].ToByteArray();
}

I then create a Hamming distance lookup array:

int[][] lookup = new int[256][];

for (int i = 0; i < 256; i++) {
    lookup[i] = new int[256];
    for (int j = 0; j < 256; j++)
    {
        lookup[i][j] = HammingDistance(i, j);
    }
}

static int HammingDistance(BigInteger a, BigInteger b)
{
    BigInteger n = a ^ b;

    int x = 0;
    while (n != 0)
    {
        n &= (n - 1);
        x++;
    }
    return x;
}

Finally, I calculate the total Hamming distance by calculating the sum of the Hamming distances between the bytes. My time measures have shown that "manually" adding the distances was faster than using a loop:

static List<int> GetMatches(byte[] a)
{
    List<int> result = new List<int>();
    for (int i = 0; i < bHashes.Length; i++)
    {
        byte[] b = bHashes[i];
        int dist = lookup[a[0]][b[0]] +
                   lookup[a[1]][b[1]] +
                   lookup[a[2]][b[2]] +
                   lookup[a[3]][b[3]] +
                   lookup[a[4]][b[4]] +
                   lookup[a[5]][b[5]] +
                   lookup[a[6]][b[6]] +
                   lookup[a[7]][b[7]] +
                   lookup[a[8]][b[8]] +
                   lookup[a[9]][b[9]] +
                   lookup[a[10]][b[10]] +
                   lookup[a[11]][b[11]] +
                   lookup[a[12]][b[12]] +
                   lookup[a[13]][b[13]] +
                   lookup[a[14]][b[14]];
        if (dist < THRESHOLD) result.Add(i);
    }
    return result;
}

Preprocessing time is irrelevant, only the execution time of the GetMatches() function matters. Using the method above, my system needs ~1,2s which, unfortunately, is way to long for my needs. Is there a faster way?

来源:https://stackoverflow.com/questions/40676129/fastest-way-to-calculate-hamming-distance-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!