Difference between classification and clustering in data mining? [closed]

前端未结

关注

 21  1176

-上瘾入骨i

相关标签:

21条回答

南旧

2020-12-04 05:53

If you have asked this question to any data mining or machine learning persons they will use the term supervised learning and unsupervised learning to explain you the difference between clustering and classification. So let me first explain you about the key word supervised and unsupervised.

Supervised learning: suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place. suppose the fruits are apple,banana,cherry, and grape. so you already know from your previous work that, the shape of each and every fruit so it is easy to arrange the same type of fruits at one place. here your previous work is called as trained data in data mining. so you already learn the things from your trained data, This is because of you have a response variable which says you that if some fruit have so and so features it is grape, like that for each and every fruit.

This type of data you will get from the trained data. This type of learning is called as supervised learning. This type solving problem comes under Classification. So you already learn the things so you can do you job confidently.

unsupervised : suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place.

This time you don't know any thing about that fruits, you are first time seeing these fruits so how will you arrange the same type of fruits.

What you will do first is you take on the fruit and you will select any physical character of that particular fruit. suppose you taken color.

Then you will arrange them based on the color, then the groups will be some thing like this. RED COLOR GROUP: apples & cherry fruits. GREEN COLOR GROUP: bananas & grapes. so now you will take another physical character as size, so now the groups will be some thing like this. RED COLOR AND BIG SIZE: apple. RED COLOR AND SMALL SIZE: cherry fruits. GREEN COLOR AND BIG SIZE: bananas. GREEN COLOR AND SMALL SIZE: grapes. job done happy ending.

here you didn't learn any thing before ,means no train data and no response variable. This type of learning is known unsupervised learning. clustering comes under unsupervised learning.

0 讨论(0)
发布评论:

提交评论
- 加载中...
感情败类

2020-12-04 05:55

In general, in classification you have a set of predefined classes and want to know which class a new object belongs to.

Clustering tries to group a set of objects and find whether there is some relationship between the objects.

In the context of machine learning, classification is supervised learning and clustering is unsupervised learning.

Also have a look at Classification and Clustering at Wikipedia.

0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2020-12-04 05:55

Classification

Is the assignment of predefined classes to new observations, based on learning from examples.

It is one of the key tasks in machine learning.

Clustering (or Cluster Analysis)

While popularly dismissed as "unsupervised classification" it is quite different.

In contrast to what many machine learners will teach you, it is not about assigning "classes" to objects, but without having them predefined. This is the very limited view of people who did too much classification; a typical example of if you have a hammer (classifier), everything looks like a nail (classification problem) to you. But it is also why classification people do not get a hang of clustering.

Instead, consider it as structure discovery. The task of clustering is to find structure (e.g. groups) in your data that you did not know before. Clustering has been successful if you learned something new. It failed, if you only got the structure you already knew.

Cluster analysis is a key task of data mining (and the ugly duckling in machine-learning, so don't listen to machine learners dismissing clustering).

"Unsupervised learning" is somewhat an Oxymoron

This has been iterated up and down the literature, but unsupervised learning is bllsht. It does not exist, but it is an oxymoron like "military intelligence".

Either the algorithm learns from examples (then it is "supervised learning"), or it does not learn. If all the clustering methods are "learning", then computing the minimum, maximum and average of a data set is "unsupervised learning", too. Then any computation "learned" its output. Thus the term 'unsupervised learning' is totally meaningless, it means everything and nothing.

Some "unsupervised learning" algorithms do, however, fall into the optimization category. For example k-means is a least-squares optimization. Such methods are all over statistics, so I don't think we need to label them "unsupervised learning", but instead should continue to call them "optimization problems". It's more precise, and more meaningful. There are plenty of clustering algorithms who do not involve optimization, and who do not fit into machine-learning paradigms well. So stop squeezing them in there under the umbrella "unsupervised learning".

There is some "learning" associated with clustering, but it is not the program that learns. It is the user that is supposed to learn new things about his data set.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-04 05:55

I believe classification is classifying records in a data set into predefined classes or even defining classes on the go. I look at it as pre-requisite for any valuable data mining, I like to think of it at unsupervised learning i.e. one does not know what he/she is looking for while mining the data and classification serves as a good starting point

Clustering on the other end falls under supervised learning i.e. one know what parameters to look for, the correlation between them along with critical levels. I believe it requires some understanding of statistics and maths

0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-04 05:56
One liner for Classification:

Classifying data into pre-defined categories

One liner for Clustering:

Grouping data into a set of categories

Key difference:

Classification is taking data and putting it into pre-defined categories and in Clustering the set of categories, that you want to group the data into, is not known beforehand.

Conclusion:
- Classification assigns the category to 1 new item, based on already labeled items while Clustering takes a bunch of unlabeled items and divide them into the categories
- In Classification, the categories\groups to be divided are known beforehand while in Clustering, the categories\groups to be divided are unknown beforehand
- In Classification, there are 2 phases – Training phase and then the test phase while in Clustering, there is only 1 phase – dividing of training data in clusters
- Classification is Supervised Learning while Clustering is Unsupervised Learning
I have written a long post on the same topic which you can find here:

https://neelbhatt40.wordpress.com/2017/11/21/classification-and-clustering-machine-learning-interview-questions-answers-part-i/
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-04 05:57

+Classification: you are given some new data, you have to set new label for them.

For example, a company wants to classify their prospect customers. When a new customer comes, they have to determine if this is a customer who is going to buy their products or not.

+Clustering: you're given a set of history transactions which recorded who bought what.

By using clustering techniques, you can tell the segmentation of your customers.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题