similarity

I wish to create a system where I give a sentence and the system spits out sentences similar in meaning to the input sentence I gave

三世轮回 提交于 2019-11-30 09:25:27
This is an NLP problem and I was wondering how I should proceed. How difficult is the problem? Could I replace the word with synonyms and check that the grammar is correct? Replacing words with synonyms is probably the first thing to try, but be careful not to miss multiple words expressions and idioms. Also, make sure you choose a synonym with the same part of speech. they look for a good solution < ! > they view/stare/... for a good solution they work hard < ! > they job/task/… hard More complicated rephrasing is only possible if you use some level of grammatical analysis. You should at

How to compute jaccard similarity from a pandas dataframe

核能气质少年 提交于 2019-11-30 07:30:12
问题 I have a dataframe as follows: the shape of the frame is (1510, 1399). The columns represents products, the rows represents the values (0 or 1) assigned by an user for a given product. How can I can compute a jaccard_similarity_score? I created a placeholder dataframe listing product vs. product data_ibs = pd.DataFrame(index=data_g.columns,columns=data_g.columns) I am not sure how to iterate though data_ibs to compute similarities. for i in range(0,len(data_ibs.columns)) : # Loop through the

'Similarity' in Data Mining

安稳与你 提交于 2019-11-30 07:09:15
In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? If yes, what does it deal with. Any examples, links, references will be helpful. Also, being new to the field, I would like the community opinion on how closely related Data Mining and Artificial Intelligence are. Are they synonyms, is one the subset of the other? Thanks in advance for sharing your knowledge. Yin Zhu In the field of Data Mining, is there a specific sub-discipline called 'Similarity'? Yes. There is a specific subfield in data mining and machine learning called metric learning, which aims to

Finding the closest match

本秂侑毒 提交于 2019-11-30 05:49:06
I Have an object with a set of parameters like: var obj = new {Param1 = 100; Param2 = 212; Param3 = 311; param4 = 11; Param5 = 290;} On the other side i have a list of object: var obj1 = new {Param1 = 1221 ; Param2 = 212 ; Param3 = 311 ; param4 = 11 ; Param5 = 290 ; } var obj3 = new {Param1 = 35 ; Param2 = 11 ; Param3 = 319 ; param4 = 211 ; Param5 = 790 ; } var obj4 = new {Param1 = 126 ; Param2 = 218 ; Param3 = 2 ; param4 = 6 ; Param5 = 190 ; } var obj5 = new {Param1 = 213 ; Param2 = 121 ; Param3 = 61 ; param4 = 11 ; Param5 = 29 ; } var obj7 = new {Param1 = 161 ; Param2 = 21 ; Param3 = 71 ;

字符串相似度匹配

前提是你 提交于 2019-11-30 05:39:34
/** * 相似度 * @param str1 * @param str2 */ public static float levenshtein(String str1,String str2) { int len1 = str1.length(); int len2 = str2.length(); int[][] dif = new int[len1 + 1][len2 + 1]; for (int a = 0; a <= len1; a++) { dif[a][0] = a; } for (int a = 0; a <= len2; a++) { dif[0][a] = a; } int temp; for (int i = 1; i <= len1; i++) { for (int j = 1; j <= len2; j++) { if (str1.charAt(i - 1) == str2.charAt(j - 1)) { temp = 0; } else { temp = 1; } dif[i][j] = min(dif[i - 1][j - 1] + temp, dif[i][j - 1] + 1, dif[i - 1][j] + 1); } } logger.debug("字符串[{}]与[{}]比较",str1,str2); logger.debug("差异步骤:

How to match and sort by similarity in MySQL?

自古美人都是妖i 提交于 2019-11-30 05:32:43
问题 Currently, I am doing a search function. Lets say in my database, I have this data: Keyword1 Keyword2 Keyword3 Keysomething Key and the user entered: "Key" as the keyword to search. This is my current query: SELECT * FROM data WHERE ( data_string LIKE '$key%' OR data_string LIKE '%$key%' OR data_string LIKE '%$key' ) Basically, I have 2 questions: How do I sort by (order by) similarity. From above example, I wanted "Key" as my first result. My current result is: Keyword1, Keyword2, Keyword3,

Image comparison with php + gd

一曲冷凌霜 提交于 2019-11-30 04:13:27
What's the best approach to comparing two images with php and the Graphic Draw (GD) Library ? This is the scenario: I have an image, and I want to find which image of a given set is the most similar to it. The most similar image is in fact the same image, not pixel perfect match but the same image. I've dramatised the difference between the two images with the number one on the example just to ease the understanding of what I meant. Even though it brought no consistent results, my approach was to reduce the images to 1px using the imagecopyresampled function and see how close the RGB values

Ways to calculate similarity

非 Y 不嫁゛ 提交于 2019-11-29 21:49:30
I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to some resources? George Dontas Another way of computing (in R ) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved

What is the paper “Oliver [1993]” describing a PHP algorithm to calculate text similarity?

[亡魂溺海] 提交于 2019-11-29 16:24:17
问题 There is a function similar_text() in the PHP library. The documentation (http://php.net/manual/en/function.similar-text.php) tells me that "This calculates the similarity between two strings as described in Oliver [1993]." Despite extensive searching, I can't find the paper that "Oliver [1993]" is referring to; nor any candidate for who "Oliver" might be. The PHP source is undocumented. The only other reference to Oliver 1993 is in a forum at http://www.codeguru.com/forum/showthread.php?t

Compute the similarity between two lists

China☆狼群 提交于 2019-11-29 14:41:06
问题 I'd like to compute the similarity between two lists of various lengths. eg: listA = ['apple', 'orange', 'apple', 'apple', 'banana', 'orange'] # (length = 6) listB = ['apple', 'orange', 'grapefruit', 'apple'] # (length = 4) as you can see, a single item can appear multiple times in a list, and the lengths are of different sizes. I've already thought of comparing the frequencies of each item, but that does not encompass the size of each list (a list that is simply twice another list should be