Read data from CSV file and transform from string to correct data-type, including a list-of-integer column

前端 未结 7 1527
难免孤独
难免孤独 2020-11-28 09:44

When I read data back in from a CSV file, every cell is interpreted as a string.

  • How can I automatically convert the data I read in into the correct type?
7条回答
  •  星月不相逢
    2020-11-28 10:26

    Props to Jon Clements and cortopy for teaching me about ast.literal_eval! Here's what I ended up going with (Python 2; changes for 3 should be trivial):

    from ast import literal_eval
    from csv import DictReader
    import csv
    
    
    def csv_data(filepath, **col_conversions):
        """Yield rows from the CSV file as dicts, with column headers as the keys.
    
        Values in the CSV rows are converted to Python values when possible,
        and are kept as strings otherwise.
    
        Specific conversion functions for columns may be specified via
        `col_conversions`: if a column's header is a key in this dict, its
        value will be applied as a function to the CSV data. Specify
        `ColumnHeader=str` if all values in the column should be interpreted
        as unquoted strings, but might be valid Python literals (`True`,
        `None`, `1`, etc.).
    
        Example usage:
    
        >>> csv_data(filepath,
        ...          VariousWordsIncludingTrueAndFalse=str,
        ...          NumbersOfVaryingPrecision=float,
        ...          FloatsThatShouldBeRounded=round,
        ...          **{'Column Header With Spaces': arbitrary_function})
        """
    
        def parse_value(key, value):
            if key in col_conversions:
                return col_conversions[key](value)
            try:
                # Interpret the string as a Python literal
                return literal_eval(value)
            except Exception:
                # If that doesn't work, assume it's an unquoted string
                return value
    
        with open(filepath) as f:
            # QUOTE_NONE: don't process quote characters, to avoid the value
            # `"2"` becoming the int `2`, rather than the string `'2'`.
            for row in DictReader(f, quoting=csv.QUOTE_NONE):
                yield {k: parse_value(k, v) for k, v in row.iteritems()}
    

    (I'm a little wary that I might have missed some corner cases involving quoting. Please comment if you see any issues!)

提交回复
热议问题