Sorting CSV in Python

后端 未结 4 1416
暖寄归人
暖寄归人 2020-12-09 13:37

I assumed sorting a CSV file on multiple text/numeric fields using Python would be a problem that was already solved. But I can\'t find any example code anywhere, except for

相关标签:
4条回答
  • 2020-12-09 14:17

    You bring up 3 issues:

    • file size
    • csv data
    • sorting on multiple fields

    Here is a solution for the third part. You can handle csv data in a more sophisticated way.

    >>> data = 'a,b,c\nb,b,a\nb,c,a\n'
    >>> lines = [e.split(',') for e in data.strip().split('\n')]
    >>> lines
    [['a', 'b', 'c'], ['b', 'b', 'a'], ['b', 'c', 'a']]
    >>> def f(e):
    ...     field_order = [2,1]
    ...     return [e[i] for i in field_order]
    ... 
    >>> sorted(lines, key=f)
    [['b', 'b', 'a'], ['b', 'c', 'a'], ['a', 'b', 'c']]
    

    Edited to use a list comprehension, generator does not work as I had expected it to.

    0 讨论(0)
  • 2020-12-09 14:35

    Python's sort works in-memory only; however, tens of thousands of lines should fit in memory easily on a modern machine. So:

    import csv
    
    def sortcsvbymanyfields(csvfilename, themanyfieldscolumnnumbers):
      with open(csvfilename, 'rb') as f:
        readit = csv.reader(f)
        thedata = list(readit)
      thedata.sort(key=operator.itemgetter(*themanyfieldscolumnnumbers))
      with open(csvfilename, 'wb') as f:
        writeit = csv.writer(f)
        writeit.writerows(thedata)
    
    0 讨论(0)
  • 2020-12-09 14:38

    Here's the convert() that's missing from Robert's fix of Alex's answer:

    >>> def convert(convert_funcs, seq):
    ...    return [
    ...        item if func is None else func(item)
    ...        for func, item in zip(convert_funcs, seq)
    ...        ]
    ...
    >>> convert(
    ...     (None, float, lambda x: x.strip().lower()),
    ...     [" text ", "123.45", " TEXT "]
    ...     )
    [' text ', 123.45, 'text']
    >>>
    

    I've changed the name of the first arg to highlight that the per-columns function can do what you need, not merely type-coercion. None is used to indicate no conversion.

    0 讨论(0)
  • 2020-12-09 14:40

    Here's Alex's answer, reworked to support column data types:

    import csv
    import operator
    
    def sort_csv(csv_filename, types, sort_key_columns):
        """sort (and rewrite) a csv file.
        types:  data types (conversion functions) for each column in the file
        sort_key_columns: column numbers of columns to sort by"""
        data = []
        with open(csv_filename, 'rb') as f:
            for row in csv.reader(f):
                data.append(convert(types, row))
        data.sort(key=operator.itemgetter(*sort_key_columns))
        with open(csv_filename, 'wb') as f:
            csv.writer(f).writerows(data)
    

    Edit:

    I did a stupid. I was playing with various things in IDLE and wrote a convert function a couple of days ago. I forgot I'd written it, and I haven't closed IDLE in a good long while - so when I wrote the above, I thought convert was a built-in function. Sadly no.

    Here's my implementation, though John Machin's is nicer:

    def convert(types, values):
        return [t(v) for t, v in zip(types, values)]
    

    Usage:

    import datetime
    def date(s):
        return datetime.strptime(s, '%m/%d/%y')
    
    >>> convert((int, date, str), ('1', '2/15/09', 'z'))
    [1, datetime.datetime(2009, 2, 15, 0, 0), 'z']
    
    0 讨论(0)
提交回复
热议问题