similarity

A Bottom-up Clustering Approach to Unsupervised Person Re-identification (AAAI2019)

匿名 (未验证) 提交于 2019-12-02 23:39:01
一、介绍 这篇文章解决的是无任何标签的无监督行人再识别问题,作者提出了一种自底向上聚类方法(bottom-up clustering BUC)来联合优化CNN和无标签样本间的关系。作者的方法考虑到了行人再识别任务的两个基本的事实:不同人间的diversity和同一个人间的similarity。作者的算法最开始把每个人作为单独的一类,来最大化每类的diversity,然后逐渐的把相似的类合并为同一类,来提升每类的similarity。作者在自底向上的聚类过程中利用了一个多样性正则项来平和每个cluster的数据量,最终,作者的模型在diversity和similarity之间达到了很好的平衡。作者在图片和视频行人再识别数据集上进行了实验,包括Market-1501,DukeMTMC-reID, MARS and DukeMTMC-VideoReID,实验结果证明作者的算法不仅仅超过了无监督行人重识别的sota,而且跟迁移学习和半监督学习的方法相比也有很好的结果。 二、作者的方法 后向时,使用下式更新查找表V: 作者指出,在优化过程中,Vj包含了第j类的所有信息,因此可以看作是一种类的中心点。作者不直接通过所有特征计算类中心点是为了降低计算复杂度。查找表V能够减轻大量的计算。作者提出来的损失函数能够使本类cosine距离为1,和其他类距离为0

Detecting image equality at different resolutions

拟墨画扇 提交于 2019-12-02 21:02:40
I'm trying to build a script to go through my original, high-res photos and replace the old, low-res ones I uploaded to Flickr before I had a pro account. For many of them I can just use Exif info such as date taken to determine a match. But some are really old, and either the original file didn't have Exif info, or it got clobbered by whatever stupid resizing software I used at the time. So, unable to rely on metadata, I'm forced to resort to the content itself. The problem is that the originals are in different resolutions than the ones on Flickr (which is the whole point of this endeavour).

Python equivalent of daisy() in the cluster package of R

放肆的年华 提交于 2019-12-02 19:30:53
I have a dataset that contains both categorical (nominal and ordinal) and numerical attributes. I want to calculate the (dis)similarity matrix across my observations using these mixed attributes. Using the daisy() function of the cluster package in R, I can easily get a dissimilarity matrix as follows: if(!require("cluster")) { install.packages("cluster"); require("cluster") } data(flower) as.matrix(daisy(flower, metric = "gower")) This uses the gower metric to deal with the nominal variables. Is there a Python equivalent of the daisy() function in R? Or maybe any other module function that

How to detect that two sentences are similar?

拈花ヽ惹草 提交于 2019-12-02 18:09:55
I want to compute how similar two arbitrary sentences are to each other. For example: A mathematician found a solution to the problem. The problem was solved by a young mathematician. I can use a tagger, a stemmer, and a parser, but I don’t know how detect that these sentences are similar. These two sentences are not just similar, they are almost paraphrases , i.e., two alternative ways of expressing the same meaning. It is also a very simple case of paraphrase, in which both utterances use the same words with the only exception of one being in active form while the other is passive. (The two

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

谁都会走 提交于 2019-12-02 17:46:23
I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the time domain based feature into frequency domain based feature (which I think maybe more robust), and then do some classification or clustering algorithms. But I'm not sure using what descriptor as frequency domain based feature, since there are amplitude spectrum, power spectrum and phase spectrum of a signal and I've read some references but still

Compare 5000 strings with PHP Levenshtein

一世执手 提交于 2019-12-02 15:19:41
I have 5000, sometimes more, street address strings in an array. I'd like to compare them all with levenshtein to find similar matches. How can I do this without looping through all 5000 and comparing them directly with every other 4999? Edit: I am also interested in alternate methods if anyone has suggestions. The overall goal is to find similar entries (and eliminate duplicates) based on user-submitted street addresses. I think a better way to group similar addresses would be to: create a database with two tables - one for the address (and a id), one for the soundexes of words or literal

Find similar images in (pure) PHP / MySQL

守給你的承諾、 提交于 2019-12-02 14:07:13
My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to 1. create some kind of image "hash" of every existing image 2. create a hash of newly uploaded image and compare it with the other in the database i have found some interesting solutions like http://www.pureftpd.org/project/libpuzzle or or http://phash.org/ etc. but they got one or more problems they need some nonstandard extension to PHP (or are not in PHP at all) - it would be OK for me, but I would like to create it as a plugin to my popular CMS, which is used on

Calculating Binary Data Similarity

廉价感情. 提交于 2019-12-02 14:03:30
I've seen a few questions here related to determining the similarity of files, but they are all linked to a particular domain (images, sounds, text, etc). The techniques offered as solutions require knowledge of the underlying file format of the files being compared. What I am looking for is a method without this requirement, where arbitrary binary files could be compared without needing to understand what type of data they contain. That is, I am looking to determine the similarity percentage of two files' binary data . To give a little more detail for you to work with, even though this is

How to perform trigram operations in Google BigQuery?

旧城冷巷雨未停 提交于 2019-12-02 01:47:22
I do use the pg_trgm module in PostgreSQL to calculate similarity between two strings using trigrams. Particularly I use: similarity(text, text) Which returns returns a number that indicates how similar the two arguments are (between 0 and 1). How can I perform similarity function (or equivalent) on Google BigQuery? Try below. At least as a blueprint for enhancing SELECT text1, text2, similarity FROM JS( // input table ( SELECT * FROM (SELECT 'mikhail' AS text1, 'mikhail' AS text2), (SELECT 'mikhail' AS text1, 'mike' AS text2), (SELECT 'mikhail' AS text1, 'michael' AS text2), (SELECT 'mikhail'

How can you compare two cluster groupings in terms of similarity or overlap in Python?

佐手、 提交于 2019-12-01 19:08:37
Simplified example of what I'm trying to do: Let's say I have 3 data points A, B, and C. I run KMeans clustering on this data and get 2 clusters [(A,B),(C)] . Then I run MeanShift clustering on this data and get 2 clusters [(A),(B,C)] . So clearly the two clustering methods have clustered the data in different ways. I want to be able to quantify this difference. In other words, what metric can I use to determine percent similarity/overlap between the two cluster groupings obtained from the two algorithms? Here is a range of scores that might be given: 100% score for [(A,B),(C)] vs. [(A,B),(C)]