Using first row as variable with python

问题

I want to change this piece of code where indicated to be more dynamic and specific. I would like to use the first row information in each column as a header that substitutes 'numAtts'. That way, the first row would also not be included in the data underneath the @data.

Here is my code:

# -*- coding: UTF-8 -*-

import logging
from optparse import OptionParser
import sys

def main():
    LEVELS = {'debug': logging.DEBUG,
              'info': logging.INFO,
              'warning': logging.WARNING,
              'error': logging.ERROR,
              'critical': logging.CRITICAL}

    usage = "usage: arff automate [options]\n ."
    parser = OptionParser(usage=usage, version="%prog 1.0")

    #Defining options   
    parser.add_option("-l", "--log", dest="level_name", default="info", help="choose the logging level: debug, info, warning, error, critical")    

    #Parsing arguments
    (options, args) = parser.parse_args()

    #Mandatory arguments    
    if len(args) != 1:
        parser.error("incorrect number of arguments")

    inputPath = args[0]


    # Start program ------------------

    with open(inputPath, "r") as f:
        strip = str.strip
        split = str.split
        data = [split(strip (line)) for line in f]

###############################################################
## modify here##

    numAtts = len(data[0])
    logging.info(" Number of attributes : "+str(numAtts) )

    print "@RELATION relationData"
    print ""

    for e in range(numAtts):
        print "@ATTRIBUTE att{0} NUMERIC".format(e)

###############################################################

    classSet = set()
    for e in data:
        classSet.add(e[-1])

    print "@ATTRIBUTE class {%s}" % (",".join(classSet))
    print ""

    print "@DATA"

    for item in data:
        print ",".join(item[0:])


if __name__ == "__main__":
    main()

The input file is like this (tab-separated):

F1  F2  F3  F4  F5  F6  STRING
7209    3004    15302   5203    2   1   EXAMPLEA
6417    3984    16445   5546    15  1   EXAMPLEB
8822    3973    23712   7517    18  0   EXPAMPLEC

The output file (actual) is like this:

@RELATION relationData

@ATTRIBUTE att0 NUMERIC
@ATTRIBUTE att1 NUMERIC
@ATTRIBUTE att2 NUMERIC
@ATTRIBUTE att3 NUMERIC
@ATTRIBUTE att4 NUMERIC
@ATTRIBUTE att5 NUMERIC
@ATTRIBUTE att6 NUMERIC
@ATTRIBUTE class {EXAMPLEB,STRING,EXPAMPLEC,EXAMPLEA}

@DATA
F1,F2,F3,F4,F5,{0,1},STRING
7209,3004,15302,5203,2,1,EXAMPLEA
6417,3984,16445,5546,15,1,EXAMPLEB
8822,3973,23712,7517,18,0,EXPAMPLEC

The desired output file is like this:

@RELATION relationData
@attribute 'att[F1]' numeric
@attribute 'att[F2]' numeric
@attribute 'att[F3]' numeric
@attribute 'att[F4]' numeric
@attribute 'att[F5]' numeric
@attribute 'att[F6]' {0,1}
@attribute 'class' STRING

@data
7209,3004,15302,5203,2,1,EXAMPLEA
6417,3984,16445,5546,15,1,EXAMPLEB
8822,3973,23712,7517,18,1,EXPAMPLEC

So, as you see my code is almost there, but I am unable / unsure how to mark the first row as a variable that is used for the header and start processing the data with row 2.

Thus, my question is: How can I format the output to use the 1st row as a header? Does anyone have any insight? Thanks!

回答1:

You are not exactly formatting your desired title to output. Here

for e in range(numAtts):
        print "@ATTRIBUTE att{0} NUMERIC".format(e)

you are merely formatting value of e to output. You need to access the data[0] here.

for e in range(numAtts):
        print "@ATTRIBUTE att'[{0}]'' NUMERIC".format(dataa[0][e] )

And later for usage part you can exploit range/xrange to skip 0th index.

for e in range(1, numAtts):
    print ",".join(data[e][0:])

Also I would suggest there is no need to store str methods in variables you can use method chaining to get desired value. Instead of this:

data = [split(strip (line)) for line in f]

use this:

data = [line.strip().split() for line in f]

*********** Edited to include this option ***********

next also permits the skipping of the first row, beginning the data segment, therefore with the second.

next(iter(data))
for item in data[1:]:
    print ",".join(item[0:])

回答2:

You can take advantage of the fact that open in python returns a generator. f.readline() gets you the next available line in the file. It also causes the generator to move to the next line, so in the list comprehension, it'll skip the line you've already read with f.readline(). (See documentation here: https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)

with open(inputPath, "r") as f:
    strip = str.strip
    split = str.split
    titles = split(strip (f.readline())
    data = [split(strip (line)) for line in f]

来源：https://stackoverflow.com/questions/31142884/using-first-row-as-variable-with-python

标签

python

variables

header

row