Writing/parsing a fixed width file using Python

问题

I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires.

Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. I have the documentation so I know what the length of each "chunk" is. The response that I get back is easier to parse since it will already have data and I can use Python's "slices" to extract what I need, but I can't assign to a slice - I tried that already because it sounded like a good solution, and it didn't work since Python strings are immutable :)

Like I said I'm really a newbie to Python but I'm excited about learning it :) How would I go about doing this? Ideally I'd want to be able to say that range 10-20 is equal to "Foo" and have it be the string "Foo" with 7 additional whitespace characters (assuming said field has a length of 10) and have that be a part of the larger 80-character field, but I'm not sure how to do what I'm thinking.

回答1:

You don't need to assign to slices, just build the string using % formatting.

An example with a fixed format for 3 data items:

>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
'   1       ONE         2'
>>>

Same thing, field width supplied with the data:

>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
'   1       ONE         2'
>>>

Separating data and field widths, and using zip() and str.join() tricks:

>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
'   1       ONE         2'
>>>

回答2:

Hopefully I understand what you're looking for: some way to conveniently identify each part of the line by a simple variable, but output it padded to the correct width?

The snippet below may give you what you want

class FixWidthFieldLine(object):

    fields = (('foo', 10),
              ('bar', 30),
              ('ooga', 30),
              ('booga', 10))

    def __init__(self):
        self.foo = ''
        self.bar = ''
        self.ooga = ''
        self.booga = ''

    def __str__(self):
        return ''.join([getattr(self, field_name).ljust(width) 
                        for field_name, width in self.fields])

f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'

print f

This yields:

hi        joe                           howya                         doing

It works by storing a class-level variable, fields which records the order in which each field should appear in the output, together with the number of columns that field should have. There are correspondingly-named instance variables in the __init__ that are set to an empty string initially.

The __str__ method outputs these values as a string. It uses a list comprehension over the class-level fields attribute, looking up the instance value for each field by name, and then left-justifying it's output according to the columns. The resulting list of fields is then joined together by an empty string.

Note this doesn't parse input, though you could easily override the constructor to take a string and parse the columns according to the field and field widths in fields. It also doesn't check for instance values that are longer than their allotted width.

回答3:

You can use justify functions to left-justify, right-justify and center a string in a field of given width.

'hi'.ljust(10) -> 'hi        '

回答4:

I know this thread is quite old, but we use a library called django-copybook. It has nothing to do with django (anymore). We use it to go between fixed width cobol files and python. You create a class to define your fixed width record layout and can easy move between typed python objects and fixed width files:

USAGE:
class Person(Record):
    first_name = fields.StringField(length=20)
    last_name = fields.StringField(length=30)
    siblings = fields.IntegerField(length=2)
    birth_date = fields.DateField(length=10, format="%Y-%m-%d")

>>> fixedwidth_record = 'Joe                 Smith                         031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)

It can also handle situations similar to Cobol's OCCURS functionality like when a particular section is repeated X times

回答5:

It's a little difficult to parse your question, but I'm gathering that you are receiving a file or file-like-object, reading it, and replacing some of the values with some business logic results. Is this correct?

The simplest way to overcome string immutability is to write a new string:

# Won't work:
test_string[3:6] = "foo"

# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]

Having said that, it sounds like it's important to you that you do something with this string, but I'm not sure exactly what that is. Are you writing it back to an output file, trying to edit a file in place, or something else? I bring this up because the act of creating a new string (which happens to have the same variable name as the old string) should emphasize the necessity of performing an explicit write operation after the transformation.

回答6:

You can convert the string to a list and do the slice manipulation.

>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'

回答7:

It is easy to write function to "modify" string.

def change(string, start, end, what):
    length = end - start
    if len(what)<length: what = what + " "*(length-len(what))
    return string[0:start]+what[0:length]+string[end:]

Usage:

test_string = 'This is test string'

print test_string[5:7]  
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X    string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string

回答8:

I used Jarret Hardie's example and modified it slightly. This allows for selection of type of text alignment(left, right or centered.)

class FixedWidthFieldLine(object):
    def __init__(self, fields, justify = 'L'):
        """ Returns line from list containing tuples of field values and lengths. Accepts
            justification parameter.
            FixedWidthFieldLine(fields[, justify])

            fields = [(value, fieldLenght)[, ...]]
        """
        self.fields = fields

        if (justify in ('L','C','R')):
            self.justify = justify
        else:
            self.justify = 'L'

    def __str__(self):
        if(self.justify == 'L'):
            return ''.join([field[0].ljust(field[1]) for field in self.fields])
        elif(self.justify == 'R'):
            return ''.join([field[0].rjust(field[1]) for field in self.fields])
        elif(self.justify == 'C'):
            return ''.join([field[0].center(field[1]) for field in self.fields])

fieldTest = [('Alex', 10),
         ('Programmer', 20),
         ('Salem, OR', 15)]

f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f

Returns:

Alex      Programmer          Salem, OR      
      Alex          Programmer      Salem, OR

来源：https://stackoverflow.com/questions/848537/writing-parsing-a-fixed-width-file-using-python

标签

python

parsing

edi