Same Instances header ( arff ) for all my database queries

怎甘沉沦 提交于 2019-12-02 07:34:51
Atilla Ozgur

I tried various approaches to my problem. But it seems that weka internal API does not allow solution to this problem right now. I modified weka.core.Instances append command line code for my purposes. This code is also given in this answer

According to this, here is my solution. I created a SampleWithKnownHeader.arff file , which contains correct header values. I read this file with following code.

public static Instances getSampleInstances() {
    Instances data = null;
    try {
        BufferedReader reader = new BufferedReader(new FileReader(
                "datas\\SampleWithKnownHeader.arff"));
        data = new Instances(reader);
        reader.close();
        // setting class attribute
        data.setClassIndex(data.numAttributes() - 1);
    }
    catch (Exception e) {
        throw new RuntimeException(e);
    } 
    return data;

}

After that , I use following code to create instances. I had to use StringBuilder and string values of instance, then I save corresponding string to file.

public static void main(String[] args) {

    Instances SampleInstance = MyUtilsForWeka.getSampleInstances();

    DataSource source1 = new DataSource(SampleInstance);

    Instances data2 = InstancesFromDatabase
            .getInstanceDataFromDatabase(DatabaseQueries.WEKALIST_QUESTION1);

    MyUtilsForWeka.saveInstancesToFile(data2, "fromDatabase.arff");

    DataSource source2 = new DataSource(data2);

    Instances structure1;
    Instances structure2;
    StringBuilder sb = new StringBuilder();
    try {
        structure1 = source1.getStructure();
        sb.append(structure1);
        structure2 = source2.getStructure();
        while (source2.hasMoreElements(structure2)) {
            String elementAsString = source2.nextElement(structure2)
                    .toString();
            sb.append(elementAsString);
            sb.append("\n");

        }

    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }

    MyUtilsForWeka.saveInstancesToFile(sb.toString(), "combined.arff");

}

My save instances to file code is as below.

public static void saveInstancesToFile(String contents,String filename) {

     FileWriter fstream;
    try {
        fstream = new FileWriter(filename);
      BufferedWriter out = new BufferedWriter(fstream);
      out.write(contents);
      out.close();
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }

This solves my problem but I wonder if more elegant solution exists.

I solved a similar problem with the Add filter that allows adding attributes to Instances. You need to add a correct Attibute with proper list of values to both datasets (in my case - to test dataset only):

Load train and test data:

/* "train" contains labels and data */
/* "test" contains data only */
CSVLoader csvLoader = new CSVLoader();
csvLoader.setFile(new File(trainFile));
Instances training = csvLoader.getDataSet();
csvLoader.reset();
csvLoader.setFile(new File(predictFile));
Instances test = csvLoader.getDataSet();

Set a new attribute with Add filter:

Add add = new Add();
/* the name of the attribute must be the same as in "train"*/
add.setAttributeName(training.attribute(0).name());
/* getValues returns a String with comma-separated values of the attribute */
add.setNominalLabels(getValues(training.attribute(0)));
/* put the new attribute to the 1st position, the same as in "train"*/
add.setAttributeIndex("1");
add.setInputFormat(test);
/* result - a compatible with "train" dataset */
test = Filter.useFilter(test, add);

As a result, the headers of both "train" and "test" are the same (compatible for Weka machine learning)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!