How to use existing data in ELKI

老子叫甜甜 提交于 2020-01-25 08:43:14

问题


I keep stubbling upon ELKI these couple of days while searching for the most suitable density clustering tool and decided to try it. For DBSCAN, I've managed to reproduce successfully the test which clusters the file "3clusters-and-noise-2d.csv" and have also managed to print clusters metadata and points in each cluster all via ELKI code from github (latest version) IN java (I'm not really interested in cli or ui tool).

Now, I want to use some kind of internal java structure to create a database instead of importing via a file to reduce write and read overhead.

In the example provided I'm able to do this but for only the first column of the file.

My question basically is, how to create the same database which was created via a file, when the same data already exists in java?

Got it!

so after some tweaking, basically what you do is use 2d array of doubles where each row represents a point and you have as much columns as your dimensions... to create your database without reading a file, you basically use an ArrayAdapterDatabaseConnection as follows:

    double[][] data = new double[NUM_OF_POINTS][NUM_OF_DIMENSIONS]; 
    //populate data according to your app
    DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(data);
    Database db = new StaticArrayDatabase(dbc, null);
    db.initialize();

    //dbscan algorithm setup
    params = new ListParameterization();
    params.addParameter(DBSCAN.Parameterizer.EPSILON_ID, 0.04);
    params.addParameter(DBSCAN.Parameterizer.MINPTS_ID, 20);
    DBSCAN<DoubleVector> dbscan = ClassGenericsUtil.parameterizeOrAbort(DBSCAN.class, params);

    //run DBSCAN on database
    Clustering<Model> result = dbscan.run(db);

I've tested this with the "3clusters-and-noise-2d.csv" dataset and can confirm i get same results when I pass them via file or arrayadapter.


回答1:


A complete example can be found in the ELKI sources:

http://elki.dbs.ifi.lmu.de/browser/elki/elki/src/main/java/tutorial/javaapi/PassingDataToELKI.java

It generates random data and runs k-means on it. It also shows how to reliably map back DBIDs to your data points.



来源:https://stackoverflow.com/questions/31591883/how-to-use-existing-data-in-elki

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!