Is there a good and easy way to visualize high dimensional data?

南笙酒味 提交于 2019-12-20 08:39:40

问题


Can someone please tell me if there is a good (easy) way to visualize high dimensional data? My data is currently 21 dimensions but I would like to see how whether it is dense or sparse. Are there techniques to achieve this?


回答1:


Principal component analysis could be helpful if the dimensions are correlated.




回答2:


Parallel coordinates are a popular method for visualizing high-dimensional data.

What kind of visualization is best for your data in particular will depend on its characteristics-- how correlated are the different dimensions?




回答3:


The buzzword I would search for is multidimensional scaling. It is a technique to develop a projection from the high dimensional space to a lower space (2 or 3 dimensional) in such a way that points which are close in the full space will be close in the projection.

It is often used for visualising the output of clustering algorithms (i.e. if your clusters are compact in the MDS projection there is a good chance they are also in the full space).

Edit: This wouldn't necessarily help with determining if the data is dense or sparse, because you lose the scale in the projection, but it would show whether it is uniform or clumpy (perhaps thats what you mean).




回答4:


Not sure what kind of patterns you would like to see from the data. t-SNE and its faster variant Barnes-Hut-SNE do a very good job in visualizing groups of related concepts for high-dimensional data. It is available through R.

There is a short tutorial on using it against high-dimensional data with about 300 dimensions. http://www.codeproject.com/Tips/788739/Visualizing-High-Dimensional-Vector-using-T-SNE-wi




回答5:


I was looking for ways to visualize high dimensional data and found this t-SNE technique that has been used effectively. Might help others as well.




回答6:


Take a look at http://www.ggobi.org (tours, parallel coordinates, scatterplot matrices) can be used for real-valued variables. Also http://cranvas.org for more recent. The tourr package in R.




回答7:


Try using http://hypertools.readthedocs.io/en/latest/.

HyperTools is a library for visualizing and manipulating high-dimensional data in Python.




回答8:


Star Schema.

http://en.wikipedia.org/wiki/Star_schema

Works well for high-dimensional data.

If the cardinality of your fact table is close to the product of your dimension sizes, you have dense data.

If the cardinality of your fact table is smaller than the product of your dimension sizes, you have sparse data.

In the middle you have a judgement call.




回答9:


The curios.IT data exploration software is designed for the visualization of high dimensional data: data is shown as a collection of 3D objects (one for each data group) which can show up to 13 variables at the same time. The relationships between data variables and visual features are much easier to remember than with other techniques (like parallel coordinates).



来源:https://stackoverflow.com/questions/5779011/is-there-a-good-and-easy-way-to-visualize-high-dimensional-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!