hamming-distance

Find the Hamming distance between string sequences

我只是一个虾纸丫 提交于 2020-02-05 05:52:46
问题 I have a dataset of 3156 DNA sequences, each of which has 98290 characters (SNPs), comprising the (usual) 5 symbols : A, C, G, T, N (gap). What is the optimal way to find the pairwise Hamming distance between these sequences? Note that for each sequence, I actually want to find the reciprocal of the number of sequences (including itself), where the per-site hamming distance is less than some threshold (0.1 in this example). So far, I have attempted the following: library(doParallel)

Appending a List and Recursion (Hamming Distance) - Python 3

五迷三道 提交于 2020-01-06 08:14:24
问题 I'm supposed to write a program that takes a string of binary code and a number, and outputs all the strings within that hamming distance of the original string. I have a function that does everything, but in the output there are lists within lists. I understand why this is - the function is recursive, and sometimes it will return a list of possible values. The problem is, I don't know how to change it so it outputs complete strings. For example, for a string of "0000" and hamming distance "2

Appending a List and Recursion (Hamming Distance) - Python 3

你。 提交于 2020-01-06 08:14:02
问题 I'm supposed to write a program that takes a string of binary code and a number, and outputs all the strings within that hamming distance of the original string. I have a function that does everything, but in the output there are lists within lists. I understand why this is - the function is recursive, and sometimes it will return a list of possible values. The problem is, I don't know how to change it so it outputs complete strings. For example, for a string of "0000" and hamming distance "2

Python - How to generate the Pairwise Hamming Distance Matrix

旧城冷巷雨未停 提交于 2020-01-05 03:49:06
问题 beginner with Python here. So I'm having trouble trying to calculate the resulting binary pairwise hammington distance matrix between the rows of an input matrix using only the numpy library. I'm supposed to avoid loops and use vectorization. If for instance I have something like: [ 1, 0, 0, 1, 1, 0] [ 1, 0, 0, 0, 0, 0] [ 1, 1, 1, 1, 0, 0] The matrix should be something like: [ 0, 2, 3] [ 2, 0, 3] [ 3, 3, 0] ie if the original matrix was A and the hammingdistance matrix is B. B[0,1] =

Algorithm to test minimum hamming distance against a set?

爱⌒轻易说出口 提交于 2020-01-01 05:49:50
问题 I have a relative straightforward thing I want to do: Given a query number Q, a query distance d, and a set of numbers S, determine whether or not S contains any numbers with Hamming distance less than or equal to d. The simplest solution is to just make S a list and iterate over it, computing distances. If a distance less than or equal d is computed, bail out an return TRUE. But considering that all I want to do is check for an existence, something faster than a linear time solution should

Finding hamming distance of code

偶尔善良 提交于 2019-12-31 23:00:09
问题 A question asks: find the hamming distance of the following code: 11111 10101 01010 11100 00011 11001 The answer is 2. How does this work? I thought hamming distance is only between two strings? 回答1: The Hamming distance of a code is defined as the minimum distance between any 2 codewords. So, in your case, finding the Hamming distance between any 2 of the listed codewords, no one is less than 2. 回答2: Here is some Python-code to find it automatically: code = [ (0,0,0,0,0,0), (0,0,1,0,0,1), (0

Query points on the vertices of a Hamming cube

橙三吉。 提交于 2019-12-30 14:43:51
问题 I have N points that lie only on the vertices of a cube, of dimension D, where D is something like 3. A vertex may not contain any point. So every point has coordinates in {0, 1} D . I am only interested in query time , as long as the memory cost is reasonable ( not exponential in N for example :) ). Given a query that lies on one of the cube's vertices and an input parameter r , find all the vertices (thus points) that have hamming distance <= r with the query. What's the way to go in a c++

What is the hamming distance, and how do I determine it for a CRC scheme?

北城以北 提交于 2019-12-22 05:04:06
问题 While studying for a class in computer networks, the prof talked about the hamming distance between 2 valid code words in a sample code. I have read about hamming distance, and it makes sense from the perspective of telling the difference distance between 2 strings. For example: Code Word 1 = 10110 The sender sends code word 1, and there is an error introduced, and the receiver receives 10100. So you see that the 4th bit was corrupted. This would result in the a hamming distance of 1 because:

Calculate distance between two descriptors

主宰稳场 提交于 2019-12-22 01:20:47
问题 I'm trying to calculate the distance (Euclidean or hamming) between two descriptors already calculated. The problem is I don't want to use a matcher, I just want to calculate the distance between two descriptors. I'm using OpenCV 2.4.9 and i have mine descriptors stored in a Mat type: Mat descriptors1; Mat descriptors2; and now i just want to calculate the distance (preferably the Hamming distance since I'm using binary descriptors) between row1 of descriptors1 and row1 of descriptors2 (for

Bit string nearest neighbour searching

孤人 提交于 2019-12-19 08:53:35
问题 I have hundreds of thousands of sparse bit strings of length 32 bits. I'd like to do a nearest neighbour search on them and look-up performance is critical. I've been reading up on various algorithms but they seem to target text strings rather than binary strings. I think either locally sensitive hashing or spectral hashing seem good candidates or I could look into compression. Will any of these work well for my bit string problem ? Any direction or guidance would be greatly appreciated. 回答1: