问题
I want to change this piece of code where indicated to be more dynamic and specific. I would like to use the first row information in each column as a header that substitutes 'numAtts'. That way, the first row would also not be included in the data underneath the @data.
Here is my code:
# -*- coding: UTF-8 -*-
import logging
from optparse import OptionParser
import sys
def main():
LEVELS = {'debug': logging.DEBUG,
'info': logging.INFO,
'warning': logging.WARNING,
'error': logging.ERROR,
'critical': logging.CRITICAL}
usage = "usage: arff automate [options]\n ."
parser = OptionParser(usage=usage, version="%prog 1.0")
#Defining options
parser.add_option("-l", "--log", dest="level_name", default="info", help="choose the logging level: debug, info, warning, error, critical")
#Parsing arguments
(options, args) = parser.parse_args()
#Mandatory arguments
if len(args) != 1:
parser.error("incorrect number of arguments")
inputPath = args[0]
# Start program ------------------
with open(inputPath, "r") as f:
strip = str.strip
split = str.split
data = [split(strip (line)) for line in f]
###############################################################
## modify here##
numAtts = len(data[0])
logging.info(" Number of attributes : "+str(numAtts) )
print "@RELATION relationData"
print ""
for e in range(numAtts):
print "@ATTRIBUTE att{0} NUMERIC".format(e)
###############################################################
classSet = set()
for e in data:
classSet.add(e[-1])
print "@ATTRIBUTE class {%s}" % (",".join(classSet))
print ""
print "@DATA"
for item in data:
print ",".join(item[0:])
if __name__ == "__main__":
main()
The input file is like this (tab-separated):
F1 F2 F3 F4 F5 F6 STRING
7209 3004 15302 5203 2 1 EXAMPLEA
6417 3984 16445 5546 15 1 EXAMPLEB
8822 3973 23712 7517 18 0 EXPAMPLEC
The output file (actual) is like this:
@RELATION relationData
@ATTRIBUTE att0 NUMERIC
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
@ATTRIBUTE att4 NUMERIC
@ATTRIBUTE att5 NUMERIC
@ATTRIBUTE att6 NUMERIC
@ATTRIBUTE class {EXAMPLEB,STRING,EXPAMPLEC,EXAMPLEA}
@DATA
F1,F2,F3,F4,F5,{0,1},STRING
7209,3004,15302,5203,2,1,EXAMPLEA
6417,3984,16445,5546,15,1,EXAMPLEB
8822,3973,23712,7517,18,0,EXPAMPLEC
The desired output file is like this:
@RELATION relationData
@attribute 'att[F1]' numeric
@attribute 'att[F2]' numeric
@attribute 'att[F3]' numeric
@attribute 'att[F4]' numeric
@attribute 'att[F5]' numeric
@attribute 'att[F6]' {0,1}
@attribute 'class' STRING
@data
7209,3004,15302,5203,2,1,EXAMPLEA
6417,3984,16445,5546,15,1,EXAMPLEB
8822,3973,23712,7517,18,1,EXPAMPLEC
So, as you see my code is almost there, but I am unable / unsure how to mark the first row as a variable that is used for the header and start processing the data with row 2.
Thus, my question is: How can I format the output to use the 1st row as a header? Does anyone have any insight? Thanks!
回答1:
You are not exactly formatting your desired title to output. Here
for e in range(numAtts):
print "@ATTRIBUTE att{0} NUMERIC".format(e)
you are merely formatting value of e
to output. You need to access the data[0]
here.
for e in range(numAtts):
print "@ATTRIBUTE att'[{0}]'' NUMERIC".format(dataa[0][e] )
And later for usage part you can exploit range/xrange
to skip 0th
index.
for e in range(1, numAtts):
print ",".join(data[e][0:])
Also I would suggest there is no need to store str
methods in variables you can use method chaining to get desired value.
Instead of this:
data = [split(strip (line)) for line in f]
use this:
data = [line.strip().split() for line in f]
*********** Edited to include this option ***********
next
also permits the skipping of the first row, beginning the data segment, therefore with the second.
next(iter(data))
for item in data[1:]:
print ",".join(item[0:])
回答2:
You can take advantage of the fact that open
in python returns a generator. f.readline()
gets you the next available line in the file. It also causes the generator to move to the next line, so in the list comprehension, it'll skip the line you've already read with f.readline()
. (See documentation here: https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)
with open(inputPath, "r") as f:
strip = str.strip
split = str.split
titles = split(strip (f.readline())
data = [split(strip (line)) for line in f]
来源:https://stackoverflow.com/questions/31142884/using-first-row-as-variable-with-python