similarity | 易学教程

Is there a way to filter a django queryset based on string similarity (a la python difflib)?

阅读更多关于 Is there a way to filter a django queryset based on string similarity (a la python difflib)?

问题 I have a need to match cold leads against a database of our clients. The leads come from a third party provider in bulk (thousands of records) and sales is asking us to (in their words) "filter out our clients" so they don't try to sell our service to a established client. Obviously, there are misspellings in the leads. Charles becomes Charlie, Joseph becomes Joe, etc. So I can't really just do a filter comparing lead_first_name to client_first_name, etc. I need to use some sort of string

Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy

阅读更多关于 Compute mean squared, absolute deviation and custom similarity measure - Python/NumPy

I have a large image as an 2D array (let's assume that it is a 500 by 1000 pixels gray scale image). And I have one small image (let's say is it 15 by 15 pixels). I would like to slide the small image over the large one and for a given position of the small image I would like to calculate a measure of similarity between the small image and the underling part of the big image. I would like to be flexible in choosing a measure of similarity. For example I might want to calculate mean squared deviation or mean absolute deviation or something else (just some operation that takes two matrices of

'Similarity' in Data Mining

阅读更多关于 'Similarity' in Data Mining

问题 In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful. Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of the other? Thanks in advance for sharing your knowledge. 回答1: In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? Yes. There

To use iSPARQL to compare values using similarity measures

阅读更多关于 To use iSPARQL to compare values using similarity measures

I have a question for you. I would like to write a query that retrieves the values that are similar (given a function of similarity, such as Lev) to a given string "Londn" to make the comparison with the predicate "RDFS:label" of DBPedia. In Output, for example, I would like to get the value of "London". I have read that a usable approach might be to use iSPARQL ("Imprecise SPARQL") although it is not very widely used in the literature. Can I use iSPARQL or is there some SPARQL approach to perform the same operations? Short Version — You can do some of this in pure SPARQL You can use a query

Image comparison with php + gd

阅读更多关于 Image comparison with php + gd

问题 What's the best approach to comparing two images with php and the Graphic Draw (GD) Library? This is the scenario: I have an image, and I want to find which image of a given set is the most similar to it. The most similar image is in fact the same image, not pixel perfect match but the same image. I've dramatised the difference between the two images with the number one on the example just to ease the understanding of what I meant. Even though it brought no consistent results, my approach was

How to detect that two sentences are similar?

阅读更多关于 How to detect that two sentences are similar?

问题 I want to compute how similar two arbitrary sentences are to each other. For example: A mathematician found a solution to the problem. The problem was solved by a young mathematician. I can use a tagger, a stemmer, and a parser, but I don’t know how detect that these sentences are similar. 回答1: These two sentences are not just similar, they are almost paraphrases, i.e., two alternative ways of expressing the same meaning. It is also a very simple case of paraphrase, in which both utterances

Collaborative Filtering: Non-Personalized item-to-item similarity

阅读更多关于 Collaborative Filtering: Non-Personalized item-to-item similarity

问题 I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current users' history. I'd like to start off with a non-targeted approach before factoring in the current users' preferences. Looking at the Amazon.com recommendations

Cosine Similarity

阅读更多关于 Cosine Similarity

I calculated tf/idf values of two documents. The following are the tf/idf values: 1.txt 0.0 0.5 2.txt 0.0 0.5 The documents are like: 1.txt = > dog cat 2.txt = > cat elephant How can I use these values to calculate cosine similarity? I know that I should calculate the dot product, then find distance and divide dot product by it. How can I calculate this using my values? One more question: Is it important that both documents should have same number of words? a * b sim(a,b) =-------- |a|*|b| a*b is dot product some details: def dot(a,b): n = length(a) sum = 0 for i in xrange(n): sum += a[i] * b

Color similarity/distance in RGBA color space

阅读更多关于 Color similarity/distance in RGBA color space

问题 How to compute similarity between two colors in RGBA color space? (where the background color is unknown of course) I need to remap an RGBA image to a palette of RGBA colors by finding the best palette entry for each pixel in the image*. In the RGB color space the most similar color can be assumed to be the one with the smallest euclidean distance. However, this approach doesn't work in RGBA, e.g., Euclidean distance from rgba(0,0,0,0) to rgba(0,0,0,50%) is smaller than to rgba(100%,100%,100%

Libpuzzle Indexing millions of pictures?

阅读更多关于 Libpuzzle Indexing millions of pictures?

its about the libpuzzle libray for php ( http://libpuzzle.pureftpd.org/project/libpuzzle ) from Mr. Frank Denis. I´am trying to understand how to index and store the data in my mysql database. The generation of the vector is absolutly no problem. Example: # Compute signatures for two images $cvec1 = puzzle_fill_cvec_from_file('img1.jpg'); $cvec2 = puzzle_fill_cvec_from_file('img2.jpg'); # Compute the distance between both signatures $d = puzzle_vector_normalized_distance($cvec1, $cvec2); # Are pictures similar? if ($d < PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD) { echo "Pictures are looking