One way to do it is to run k-means with large k (much larger than what you think is the correct number), say 1000. then, running mean-shift algorithm on the these 1000 point (mean shift uses the whole data but you will only "move" these 1000 points). mean shift will find the amount of clusters then.
Running mean shift without the k-means before is a possibility but it is just too slow usually O(N^2*#steps), so running k-means before will speed things up: O(NK#steps)