similarity | 易学教程

Problems using Jama in java for LSA

阅读更多关于 Problems using Jama in java for LSA

问题 i am making using of the jama package for finding the lsa . I was told to reduce the dimensionality and hence i have reduced it to 3 in this case and i reconstruct the matrix . But the resultant matrix is very different from the one i had given to the system heres the code a = new Matrix(termdoc); // get the matrix here a = a.transpose() ; // since the matrix is in the form of doc * terms i transpose it SingularValueDecomposition sv =new SingularValueDecomposition(a) ; u = sv.getU(); v = sv

How to find strings which are similar to given string in SQL server?

阅读更多关于 How to find strings which are similar to given string in SQL server?

问题 I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table. For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table: 1 - مختاری 2 - شهاب مختاری 3 - شهاب الدین مختاری I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because: I am using MS SQL Server not MySQL my table contents

How to perform efficient queries with Gensim doc2vec?

阅读更多关于 How to perform efficient queries with Gensim doc2vec?

问题 I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I have trained both word2vec and doc2vec models. The results of the latter outperform word2vec’s, but I’m having trouble performing efficient queries with my Doc2Vec model. This model uses the distributed bag of words implementation (dm = 0). I used to infer similarity using the built in method model

adist: different Levenshtein alignments depending on how the strings are entered

阅读更多关于 adist: different Levenshtein alignments depending on how the strings are entered

问题 When using the adist function in R to compute the Levenshtein alignments between pairs of character strings, I get different results depending on whether I run the function once for each pair or use vectors to enter several pairs at once. Why is that? Example: Transformations for the string pairs 'knijpen'-'kneifen', 'grijpen'-'greifen' and 'lopen'-'laufen': attr(adist("knijpen", "kneifen", counts = TRUE), "trafos") # [,1] # [1,] "MMIMSDMM" attr(adist("grijpen", "greifen", counts = TRUE),

tf idf similarity

阅读更多关于 tf idf similarity

问题 I am using TF/IDF to calculate similarity. For example if I have the following two doc. Doc A => cat dog Doc B => dog sparrow It is normal it's similarity would be 50% but when I calculate its TF/IDF. It is as follow Tf values for Doc A dog tf = 0.5 cat tf = 0.5 Tf values for Doc B dog tf = 0.5 sparrow tf = 0.5 IDF values for Doc A dog idf = -0.4055 cat idf = 0 IDF values for Doc B dog idf = -0.4055 ( without +1 formula 0.6931) sparrow idf = 0 TF/IDF value for Doc A 0.5x-0.4055 + 0.5x0 = -0

how to find the similarity between two curves and the score of similarity?

阅读更多关于 how to find the similarity between two curves and the score of similarity?

问题 I have two data sets (t,y1) and (t,y2). These data sets visually look same but their is some time delay or magnitude shift. i want to find the similarity between the two curves (giving the score of similarity 1 for approximately similar curves and 0 for not similar curves). Some curves are seem to be different because of oscillation in data. so, i am searching for the method to find the similarity between the curves. i already tried gradient command in Matlab to find the slope of the curve at

Percentage Overlap of Two Lists

阅读更多关于 Percentage Overlap of Two Lists

问题 This is more of a math problem than anything else. Lets assume I have two lists of different sizes in Python listA = ["Alice", "Bob", "Joe"] listB = ["Joe", "Bob", "Alice", "Ken"] I want to find out what percentage overlap these two lists have. Order is not important within the lists. Finding overlap is easy, I've seen other posts on how to do that but I can't quite extend it in my mind to finding out what percentage they overlap. If I compared the lists in different orders would the result

Creating a Similarity Matrix from Raw Card-Sort Data

阅读更多关于 Creating a Similarity Matrix from Raw Card-Sort Data

问题 I have a data set from an online card sorting activity. Participants were presented with a random subset of Cards (from a larger set) and asked to create Groups of Cards they felt were similar to one another. Participants were able to create as many Groups as they liked and name the Groups whatever they wanted. An example data set is something like this: Data <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,

Transform categorical attribute vector into similarity matrix

阅读更多关于 Transform categorical attribute vector into similarity matrix

问题 I need to transfrom a categorical attribute vector into a "same attribute matrix" using R. For example I have a vector which reports gender of N people (male = 1, female = 0). I need to convert this vector into a NxN matrix named A (with people names on rows and columns), where each cell Aij has the value of 1 if two persons (i and j) have the same gender and 0 otherwise. Here is an example with 3 persons, first male, second female, third male, which produce this vector: c(1, 0, 1) I want to

Co occurance matrix for tfidf vectorizer for top 2000 words

阅读更多关于 Co occurance matrix for tfidf vectorizer for top 2000 words

问题 i computed tfidf vectorizer for text data and got vectors as (100000,2000) max_feature = 2000. while i am computing the co occurance matrix by below code. length = 2000 m = np.zeros([length,length]) # n is the count of all words def cal_occ(sentence,m): for i,word in enumerate(sentence): print(i) print(word) for j in range(max(i-window,0),min(i+window,length)): print(j) print(sentence[j]) m[word,sentence[j]]+=1 for sentence in tf_vec: cal_occ(sentence, m) I am getting the following error. 0