arff | 易学教程

Counting bi-gram frequencies

阅读更多关于 Counting bi-gram frequencies

I've written a piece of code that essentially counts word frequencies and inserts them into an ARFF file for use with weka. I'd like to alter it so that it can count bi-gram frequencies, i.e. pairs of words instead of single words although my attempts have proved unsuccessful at best. I realise there's alot to look at but any help on this is greatly appreciated. Here's my code: import re import nltk # Quran subset filename = raw_input('Enter name of file to convert to ARFF with extension, eg. name.txt: ') # create list of lower case words word_list = re.split('\s+', file(filename).read().lower

Too many attributes for ARFF format in Weka

阅读更多关于 Too many attributes for ARFF format in Weka

I am working with a data-set of dimension more than 10,000. To use Weka I need to convert text file into ARFF format, but since there are too many attributes even after using sparse ARFF format file size is too large. Is there any similar method as for data to avoid writing so many attribute identifier as in header of ARFF file. for example : @attribute A1 NUMERICAL @attribute A2 NUMERICAL ... ... @attribute A10000 NUMERICAL I coded a script in AWK to format the following lines (in a TXT file) to an ARFF example.txt source: Att_0 | Att_1 | Att_2 | ... | Att_n 1 | 2 | 3 | ... | 999 My script

Converting sparse matrix to ARFF using awk

阅读更多关于 Converting sparse matrix to ARFF using awk

I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string in the second column corresponds to the attribute and the value in the third column is a weighted score). church place 3 church institution 6 man place 86 man food 63 woman book 37 I would like to convert this to arff format using awk (if possible) so that using the above as an input, I can obtain the following output: @relation 'filename' @attribute "place" string @attribute "institution" string

Converting sparse matrix to ARFF using awk

阅读更多关于 Converting sparse matrix to ARFF using awk

问题 I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string in the second column corresponds to the attribute and the value in the third column is a weighted score). church place 3 church institution 6 man place 86 man food 63 woman book 37 I would like to convert this to arff format using awk (if possible) so that using the above as an input, I can

Weka printing sparse arff file

阅读更多关于 Weka printing sparse arff file

I was trying out the sparse representation of the arff file as shown here . In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2 8,3 B} For some reason it is not taking the index 0. Can someone tell me why this is happening? This

Weka printing sparse arff file

阅读更多关于 Weka printing sparse arff file

问题 I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2

Multi-Band Image raster to RGB

阅读更多关于 Multi-Band Image raster to RGB

问题 I have an image dataset which is a multiband dataset of arff format. It looks like this: 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 7.4000000e+001 8.4000000e+001 8.6000000e+001 7.1000000e+001 1.0000000e+000 7.4000000e+001 8.4000000e+001 8