clustering very large dataset in R
I have a dataset consisting of 70,000 numeric values representing distances ranging from 0 till 50, and I want to cluster these numbers; however, if I'm trying the classical clustering approach, then I would have to establish a 70,000X70,000 distance matrix representing the distances between each two numbers in my dataset, which won't fit in memory, so I was wondering if there is any smart way to solve this problem without the need to do stratified sampling? I also tried bigmemory and big analytics libraries in R but still can't fit the data into memory You can use kmeans , which normally