similarity

Sort results by search term similarity

爱⌒轻易说出口 提交于 2019-12-10 15:32:58
问题 I have this users collection: { "_id" : ObjectId("501faa18a34feb05890004f2"), "username" : "joanarocha", } { "_id" : ObjectId("501faa19a34feb05890005d3"), "username" : "cristianarodrigues", } { "_id" : ObjectId("501faa19a34feb05890006d8"), "username" : "anarocha", } When I query this: db.users.find({'username': /anaro/i}) results are sorted in natural order (insertion order). I would like to sort them in a similarity search-term order. In this case results should return by this order: { "_id"

R merged loop performance

北慕城南 提交于 2019-12-10 11:57:46
问题 I have 2000 rows of data for 4000 columns. What I'm trying to do is to compare each row to the rest of the rows and see how similar they are in terms of different columns/total columns. What I did so far is as follows: for (i in 1:nrow(data)) { for (j in (i+1):nrow(data)) { mycount[[i,j]] = length(which(data[i,] != data[j,])) } } There are 2 problems with it, j doesn't start from i+1 (which is probably a basic mistake) The main problem however is time it consumes, it takes ages... Could

generating bigram combinations from grouped data in pig

冷暖自知 提交于 2019-12-10 10:23:03
问题 given my input data in userid,itemid format: raw: {userid: bytearray,itemid: bytearray} dump raw; (A,1) (A,2) (A,4) (A,5) (B,2) (B,3) (B,5) (C,1) (C,5) grpd = GROUP raw BY userid; dump grpd; (A,{(A,1),(A,2),(A,4),(A,5)}) (B,{(B,2),(B,3),(B,5)}) (C,{(C,1),(C,5)}) I'd like to generate all of the combinations(order not important) of items within each group. I eventually intend on performing jaccard similarity on the items in my group. ideally my the bigrams would be generated and then I'd

ORDER BY Color with Hex Code as a criterio in MySQL

こ雲淡風輕ζ 提交于 2019-12-10 08:04:25
问题 I have a table that contains color options for a product. The color options include a hex color code, which is used to generate the UI (HTML). I would like to sort the rows so that the colors in the UI look like a rainbow, instead of the current order that sorts based off of the Name of the color (not very useful). Here is what my query looks like. I get the R G B decimal values from the hex code. I just don't know how to order it. I've looked into color difference algorithms. They seem more

Lucene打分规则与Similarity模块详解

北战南征 提交于 2019-12-09 19:50:27
搜索排序结果的控制 Lucnen 作为搜索引擎中,应用最为广泛和成功的开源框架,它对搜索结果的排序,有一套十分完整的机制来控制;但我们控制搜索结果排序的目的永远只有一个,那就是信息过滤,让用户快速,准确的找到其想要的结果,丰富用户体验。 以前看过一个牛人的博客,总结了 4 个地方,可对 Lucene 检索结果的排序进行控制,现在已经记不住。我自己简单整理了下面几个,若有疏漏,欢迎补充: 1. 通过 Lucene 自有的查询表达式: Lucene 提供相当丰富的表达式解析,要细讲就多了去了;这里只强调下,我在项目中用的比较多的是通过对指定域的加权,来影响检索结果(例如: field1:(XXX)^10 or field2:(XXX)^5 ;其中 XXX 是用户提交的检索) 2. 权重的控制:这是在建索引的时候就写入索引的,查询时只是读取出来,用乘的方式来对一些检索结果加分。据我自己看 Lucene 代码, Similarity 中也能在建索引时,对权重的写入进行控制;后面会细讲。 3. Controller 模块: Lucene 的排序流程控制模块,里面提供的一些接口能让你对打分后的搜索结果进行一些筛选和调整。 4. Similarity 模块: Lucene 的搜索结果打分控制模块,也是这里要详细分析的模块。他能让你对一个检索 结果的打分进行优化,或面目全非,哈哈。 Lucene

How to compare image similarity using php regardless of scale, rotation?

倾然丶 夕夏残阳落幕 提交于 2019-12-09 08:27:41
问题 I want to compare similarity between below images. Acording to my requirements I want to identify all of these images as similar, since it has use the same color, same clip art. The only difference in these images are rotation ,scale and the placement of the clip art. Since all 3 t-shirts has used the same color and clip art I want to identify all 3 images as similar. I tried out the method described in hackerfactor.com. But it doesn't give me correct result acording to my requirements. How

Effective clustering of a similarity matrix

人盡茶涼 提交于 2019-12-09 06:24:07
问题 my topic is similarity and clustering of (a bunch of) text(s). In a nutshell: I want to cluster collected texts together and they should appear in meaningful clusters at the end. To do this, my approach up to now is as follows, my problem is in the clustering. The current software is written in php. 1) Similarity: I treat every document as a "bag-of-words" and convert words into vectors. I use filtering (only "real" words) tokenization (split sentences into words) stemming (reduce words to

How to measure the similarity of two documents , given the similarity of each pair of words?

 ̄綄美尐妖づ 提交于 2019-12-08 11:31:35
问题 I have two documents, for example: Doc1 = {'python','numpy','machine learning'} Doc2 = {'python','pandas','tensorflow','svm','regression','R'} And I also know the similarity (correlation) of each pair of words, e.g Sim('python','python') = 1 Sim('python','pandas') = 0.8 Sim('numpy', 'R') = 0.1 What is the best way to measure the similarity of the two documents? It seems that the traditional Jaccard distance and cosine distance are not a good metric in this situation. 回答1: I like a book by

similarity index in a list of character vectors

廉价感情. 提交于 2019-12-08 07:44:32
问题 I have a list that looks like this one: $`264` [1] "CHAMP1" "MAP1S" "PRRC1" "TUT1" "CDK12" $`265` [1] "TUT1" "PRRC1" "CHAMP1" "MAP1S" $`266` [1] "REPS1" "CHAMP1" "PRRC1" "TUT1" "MAP1S" $`267` [1] "G3BP1" "TUT1" "PRRC1" "CHAMP1" "MAP1S" $`268` [1] "TUT1" "CHAMP1" "PRRC1" "MAP1S" $`269` [1] "DDB1" "CHAMP1" "TUT1" "PRRC1" "MAP1S" Is there any package or function to calculate the similarity among the different list components? Many thanks 回答1: I'm not aware of any packages, but this implements

Speed up text comparisons (feature vectors) with spatial MySQL features

前提是你 提交于 2019-12-08 06:57:03
问题 I have a function which takes two arrays containing the tokens/words of two texts and gives out the cosine similarity value which shows the relationship between both texts. The function takes an array $tokensA (0=>house, 1=>bike, 2=>man) and an array $tokensB (0=>bike, 1=>house, 2=>car) and calculates the similarity which is given back as a floating point value. function cosineSimilarity($tokensA, $tokensB) { $a = $b = $c = 0; $uniqueTokensA = $uniqueTokensB = array(); $uniqueMergedTokens =