hamming-distance

The hunt for the fastest Hamming Distance C implementation [duplicate]

为君一笑 提交于 2019-12-04 11:05:05
This question already has an answer here: Bit Operation For Finding String Difference 6 answers I want to find how many different characters two strings of equal length have. I have found that xoring algorithms are considered to be the fastest, but they return distance expressed in bits. I want the results expressed in characters. Suppose that "pet" and "pit" have distance 1 expressed in characters but 'e' and 'i' might have two different bits, so xoring returns 2. The function i wrote is: // na = length of both strings unsigned int HammingDistance(const char* a, unsigned int na, const char* b

Efficiently build a graph of words with given Hamming distance

谁说我不能喝 提交于 2019-12-04 10:22:20
问题 I want to build a graph from a list of words with Hamming distance of (say) 1, or to put it differently, two words are connected if they only differ from one letter ( lo l -> lo t ). so that given words = [ lol, lot, bot ] the graph would be { 'lol' : [ 'lot' ], 'lot' : [ 'lol', 'bot' ], 'bot' : [ 'lot' ] } The easy way is to compare every word in the list with every other and count the different chars; sadly, this is a O(N^2) algorithm. Which algo/ds/strategy can I use to to achieve better

Algorithm to test minimum hamming distance against a set?

点点圈 提交于 2019-12-03 17:36:42
I have a relative straightforward thing I want to do: Given a query number Q, a query distance d, and a set of numbers S, determine whether or not S contains any numbers with Hamming distance less than or equal to d. The simplest solution is to just make S a list and iterate over it, computing distances. If a distance less than or equal d is computed, bail out an return TRUE. But considering that all I want to do is check for an existence, something faster than a linear time solution should be possible. One thing I tried is an M-tree. Referencing some other questions on stackoverflow, the

How to find the closest pairs (Hamming Distance) of a string of binary bins in Ruby without O^2 issues?

天大地大妈咪最大 提交于 2019-12-03 06:36:32
I've got a MongoDB with about 1 million documents in it. These documents all have a string that represents a 256 bit bin of 1s and 0s, like: 0110101010101010110101010101 Ideally, I'd like to query for near binary matches. This means, if the two documents have the following numbers. Yes, this is Hamming Distance. This is NOT currently supported in Mongo. So, I'm forced to do it in the application layer. So, given this, I am trying to find a way to avoid having to do individual Hamming distance comparisons between the documents. that makes the time to do this basically impossible. I have a LOT

Hamming Distance vs. Levenshtein Distance

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-03 05:25:55
问题 For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points such that both sequences are the same length in order to satisfy the Hamming distance requirement. Is there any major problem with me doing this, since all I care about are the number of transpositions (not insertions or deletions like Levenshtein

Finding a number of maximally different binary vectors from a set

折月煮酒 提交于 2019-12-03 05:17:47
Consider the set, S , of all binary vectors of length n where each contains exactly m ones; so there are n-m zeros in each vector. My goal is to construct a number, k , of vectors from S such that these vectors are as different as possible from each other. As a simple example, take n =4, m =2 and k =2, then a possible solution is: [1,1,0,0] and [0,0,1,1]. It seems that this is an open problem in the coding theory literature (?). Is there any way (i.e. algorithm) to find a suboptimal yet good solution ? Is Hamming distance the right performance measure to use in this case ? Some thoughts: In

Efficiently build a graph of words with given Hamming distance

家住魔仙堡 提交于 2019-12-03 04:55:55
I want to build a graph from a list of words with Hamming distance of (say) 1, or to put it differently, two words are connected if they only differ from one letter ( lo l -> lo t ). so that given words = [ lol, lot, bot ] the graph would be { 'lol' : [ 'lot' ], 'lot' : [ 'lol', 'bot' ], 'bot' : [ 'lot' ] } The easy way is to compare every word in the list with every other and count the different chars; sadly, this is a O(N^2) algorithm. Which algo/ds/strategy can I use to to achieve better performance? Also, let's assume only latin chars, and all the words have the same length. Assuming you

Hamming Distance vs. Levenshtein Distance

风格不统一 提交于 2019-12-02 18:43:53
For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points such that both sequences are the same length in order to satisfy the Hamming distance requirement. Is there any major problem with me doing this, since all I care about are the number of transpositions (not insertions or deletions like Levenshtein does)? I've found that Hamming distance is much, much faster than Levenshtein as a distance metric for

Query points on the vertices of a Hamming cube

孤街浪徒 提交于 2019-12-01 14:02:16
I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3. A vertex may not contain any point. So every point has coordinates in {0, 1} D . I am only interested in query time , as long as the memory cost is reasonable ( not exponential in N for example :) ). Given a query that lies on one of the cube's vertices and an input parameter r , find all the vertices (thus points) that have hamming distance <= r with the query. What's the way to go in a c++ environment? I am thinking of a kd-tree, but I am not sure and want help, any input, even approximative

XOR bitset when 2D bitset is stored as 1D

老子叫甜甜 提交于 2019-12-01 13:13:55
To answer How to store binary data when you only care about speed? , I am trying to write some to do comparisons, so I want to use std::bitset . However, for fair comparison, I would like a 1D std::bitset to emulate a 2D. So instead of having: bitset<3> b1(string("010")); bitset<3> b2(string("111")); I would like to use: bitset<2 * 3> b1(string("010111")); to optimize data locality. However, now I am having problem with How should I store and compute Hamming distance between binary codes? , as seen in my minimal example: #include <vector> #include <iostream> #include <random> #include <cmath>