DBSCAN on spark : which implementation

后端 未结 4 2260
名媛妹妹
名媛妹妹 2020-12-28 21:09

I would like to do some DBSCAN on Spark. I have currently found 2 implementations:

  • https://github.com/irvingc/dbscan-on-spark
  • https://github.com/alito
4条回答
  •  误落风尘
    2020-12-28 22:01

    I tested https://github.com/irvingc/dbscan-on-spark and can say that it consumes a lot of memory. For 400K dataset with smooth distribution i used -Xmx12084m and even in this case it works too long (>20 min). In addition, it is only fo 2D. I used project with maven, not sbt.

    I tested also second implementation. This is still the best that I found. Unfortunately, the author does not support it since 2015. It really took some time to raise the version of the Spark and resolve the version conflicts. I needed it to deploy on aws.

提交回复
热议问题