How to intrepret Clusters results after using Doc2vec?

荒凉一梦 提交于 2019-12-02 07:35:44

Don't use the individual variables. They should be only analyzed together because of the way these embeddings are trained.

For a starter, find

  1. The most similar document vectors to your centroid to see typical cluster members
  2. The most similar term vectors from the embedding for typical words to describe the cluster
  3. Note the distances to see how good your fit is.

The clusters themselves does not mean anything specific. You can have as many clusters as you want and all the clustering algorithm would do is try to distribute all your vectors among these clusters. If you are aware of all the tweets and know how many different topics you want them to be separated in, try to clean them or have features in them such that the clustering algorithm can use those to segregate them in the clusters of your choice.

Also if you meant topic modeling, that is different from clustering and you should also look that up.

These values represent the coordinates of the individual tweets (or documents) that you want to represent in a cluster. I am assuming that v1 to v100 represent the vectors for tweets 1 to 100, otherwise this won't make sense.So if suppose cluster 0 has v1,v5 and v6, this means that tweets 1, 5 and 6 with vector representation v1,v5 and v6 respectively (or the tweets with vectors v1, v5 and v6 as their representation) belong to the cluster 0.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!