Problem with big data (?) during computation of sequence distances using TraMineR

前端未结

关注

 2  2045

不要未来只要你来 2020-12-09 22:21

I am trying to run an optimal matching analysis using TraMineR but it seems that I am encountering an issue with the size of the dataset. I have a big dataset of European co

2条回答

温柔的废话 (楼主)

2020-12-09 23:18
I never saw this error code before, but it might well be due to your high number of sequences. There are at least two things you can try to do:
- use the argument "full.matrix=FALSE" in seqdist (see help page). It will compute only the lower triangular matrix and return a "dist" object that can be used directly in the hclust function.
- You can aggregate identical sequences (you only have 12626 distinct sequences instead of 57160 sequences), compute the distances, cluster the sequences using weights (that are computed according to the number of times each distinct sequence appears in the dataset) and then add the clustering back to your original dataset. This can be made quite easily using the WeightedCluster library. The first appendix of the WeightedCluster Manual provides a step by step guide to do that (the procedure is also described on the webpage http://mephisto.unige.ch/weightedcluster).
Hope this helps.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...