I am trying to run an optimal matching analysis using TraMineR but it seems that I am encountering an issue with the size of the dataset. I have a big dataset of European co
An easy solution which often works well is to analyze a sample only of your data. For instance
employdat.sts <- employdat.sts[sample(nrow(employdat.sts),5000),]
would extract a random sample of 5000 sequences. Exploring such an important sample should be largely sufficient to find out the characteristics of your sequences, including their diversity.
To improve representativeness, you can even resort to some stratified sampling (e.g., by first or last state, or by some covariates available in your data set). Since you have the original data set at hand, you can fully control the random sampling design.
Update
If clustering is the objective and you need a cluster membership for each individual sequence see https://stackoverflow.com/a/63037549/1586731
I never saw this error code before, but it might well be due to your high number of sequences. There are at least two things you can try to do:
"full.matrix=FALSE"
in seqdist (see help page). It will compute only the lower triangular matrix and return a "dist" object that can be used directly in the hclust
function.WeightedCluster
library. The first appendix of the WeightedCluster Manual provides a step by step guide to do that (the procedure is also described on the webpage http://mephisto.unige.ch/weightedcluster).Hope this helps.