arff

Counting bi-gram frequencies

匆匆过客 提交于 2019-12-03 03:56:33
I've written a piece of code that essentially counts word frequencies and inserts them into an ARFF file for use with weka. I'd like to alter it so that it can count bi-gram frequencies, i.e. pairs of words instead of single words although my attempts have proved unsuccessful at best. I realise there's alot to look at but any help on this is greatly appreciated. Here's my code: import re import nltk # Quran subset filename = raw_input('Enter name of file to convert to ARFF with extension, eg. name.txt: ') # create list of lower case words word_list = re.split('\s+', file(filename).read().lower

Too many attributes for ARFF format in Weka

吃可爱长大的小学妹 提交于 2019-12-02 01:28:28
I am working with a data-set of dimension more than 10,000. To use Weka I need to convert text file into ARFF format, but since there are too many attributes even after using sparse ARFF format file size is too large. Is there any similar method as for data to avoid writing so many attribute identifier as in header of ARFF file. for example : @attribute A1 NUMERICAL @attribute A2 NUMERICAL ... ... @attribute A10000 NUMERICAL I coded a script in AWK to format the following lines (in a TXT file) to an ARFF example.txt source: Att_0 | Att_1 | Att_2 | ... | Att_n 1 | 2 | 3 | ... | 999 My script

Converting sparse matrix to ARFF using awk

人盡茶涼 提交于 2019-12-01 14:40:27
I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string in the second column corresponds to the attribute and the value in the third column is a weighted score). church place 3 church institution 6 man place 86 man food 63 woman book 37 I would like to convert this to arff format using awk (if possible) so that using the above as an input, I can obtain the following output: @relation 'filename' @attribute "place" string @attribute "institution" string

Converting sparse matrix to ARFF using awk

半世苍凉 提交于 2019-12-01 12:22:01
问题 I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string in the second column corresponds to the attribute and the value in the third column is a weighted score). church place 3 church institution 6 man place 86 man food 63 woman book 37 I would like to convert this to arff format using awk (if possible) so that using the above as an input, I can

Weka printing sparse arff file

烈酒焚心 提交于 2019-11-30 23:14:02
I was trying out the sparse representation of the arff file as shown here . In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2 8,3 B} For some reason it is not taking the index 0. Can someone tell me why this is happening? This

Weka printing sparse arff file

霸气de小男生 提交于 2019-11-30 18:17:25
问题 I was trying out the sparse representation of the arff file as shown here. In my program I am able to print the the class label "B" but for some reason it is not printing "A". attVals = new FastVector(); attVals.addElement("A"); attVals.addElement("B"); atts.addElement(new Attribute("class", attVals)); vals[index] = attVals.indexOf("A"); The output for the program is like - {0 6,2 8} --- I should get {0 6,2 8,3 A} But when I do vals[index] = attVals.indexOf("B"); I get proper output - {0 6,2

Multi-Band Image raster to RGB

青春壹個敷衍的年華 提交于 2019-11-26 11:39:35
问题 I have an image dataset which is a multiband dataset of arff format. It looks like this: 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 8.3000000e+001 9.3000000e+001 9.6000000e+001 7.5000000e+001 1.0000000e+000 7.4000000e+001 8.4000000e+001 8.6000000e+001 7.1000000e+001 1.0000000e+000 7.4000000e+001 8.4000000e+001 8