'Similarity' in Data Mining

前端 未结 5 1315
温柔的废话
温柔的废话 2020-12-30 11:06

In the field of Data Mining, is there a specific sub-discipline called \'Similarity\'? If yes, what does it deal with. Any examples, links, references will be helpful.

相关标签:
5条回答
  • 2020-12-30 11:16

    There are lots of similarity measurement used in data mining. for text mining, to find similarity in texts, cosine similarity, jaccard similarity widely used

    For reference, you can see raghavan and amnnings information retrieval book

    0 讨论(0)
  • 2020-12-30 11:26

    In the field of Data Mining, is there a specific sub-discipline called 'Similarity'?

    Yes. There is a specific subfield in data mining and machine learning called metric learning, which aims to learn a better distance metric among data instances.

    Do you know any of the following concepts?

    Euclidean distance

    Mahalanobis distance

    Pearson correlation

    Cosine similarity and here

    Kernel functions

    After you know these, you will know what is 'similarity'.

    I would like the community opinion on how closely related Data Mining and Artificial Intelligence are.

    It is very hard to distinguish what is data mining, what is AI. Don't discuss this question when you are new in the field. When you have learned 10 algorithms in data mining and read some AI books, you will know the difference and the relation.

    0 讨论(0)
  • 2020-12-30 11:26

    Similarity is a concept that is used in several data mining tasks such as clustering, classification. Dependings on what kind of data you have, you may used different similarity measures such as cosine similarity for text documents, euclidian distance, etc

    0 讨论(0)
  • 2020-12-30 11:35

    Just to stress the importance of the "similarity" concept.

    Data mining (AI, machine learning, modelling etc) is about bringing some function to either it's maximum or minimum value. Take the best optimization/learning/mining algorithm and a wrong function and you get a complete garbage. Note that we use "value" and not "valueS". That's because there is no (to my best knowledge) algorithm (computational or other) that is capable of optimizing more than one value. However, in our Universe, complex optimizations are more frequent than one-dimensional ones (we want to be rich AND young AND healthy). That is why there a plethora of similarity and other scoring functions exists. And that is why none of them is "the right one"

    0 讨论(0)
  • 2020-12-30 11:36

    Appropriate definitions of 'similarity' (which features you extract, what you do with them afterwards) are almost the definition of clustering, and clustering is a fairly wide sub-field of data mining.

    If you make the standard cynical definition of AI as the set of problems we can't solve well (indeed, that we can't specify well enough to start solving), data mining shades into it once the space in which you're looking for correlations starts to be larger than your algorithms can handle.

    0 讨论(0)
提交回复
热议问题