I would like to do some DBSCAN on Spark. I have currently found 2 implementations:
I tested https://github.com/irvingc/dbscan-on-spark and can say that it consumes a lot of memory. For 400K dataset with smooth distribution i used -Xmx12084m and even in this case it works too long (>20 min). In addition, it is only fo 2D. I used project with maven, not sbt.
I tested also second implementation. This is still the best that I found. Unfortunately, the author does not support it since 2015. It really took some time to raise the version of the Spark and resolve the version conflicts. I needed it to deploy on aws.