Read data from CSV file and transform from string to correct data-type, including a list-of-integer column

前端 未结 7 1510
难免孤独
难免孤独 2020-11-28 09:44

When I read data back in from a CSV file, every cell is interpreted as a string.

  • How can I automatically convert the data I read in into the correct type?
7条回答
  •  感情败类
    2020-11-28 10:27

    I know this is a fairly old question, tagged python-2.5, but here's answer that works with Python 3.6+ which might be of interest to folks using more up-to-date versions of the language.

    It leverages the built-in typing.NamedTuple class which was added in Python 3.5. What may not be evident from the documentation is that the "type" of each field can be a function.

    The example usage code also uses so-called f-string literals which weren't added until Python 3.6, but their use isn't required to do the core data-type transformations.

    #!/usr/bin/env python3.6
    import ast
    import csv
    from typing import NamedTuple
    
    
    class Record(NamedTuple):
        """ Define the fields and their types in a record. """
        IsActive: bool
        Type: str
        Price: float
        States: ast.literal_eval  # Handles string represenation of literals.
    
        @classmethod
        def _transform(cls: 'Record', dct: dict) -> dict:
            """ Convert string values in given dictionary to corresponding Record
                field type.
            """
            return {name: cls.__annotations__[name](value)
                        for name, value in dict_.items()}
    
    
    filename = 'test_transform.csv'
    
    with open(filename, newline='') as file:
        for i, row in enumerate(csv.DictReader(file)):
            row = Record._transform(row)
            print(f'row {i}: {row}')
    

    Output:

    row 0: {'IsActive': True, 'Type': 'Cellphone', 'Price': 34.0, 'States': [1, 2]}
    row 1: {'IsActive': False, 'Type': 'FlatTv', 'Price': 3.5, 'States': [2]}
    row 2: {'IsActive': True, 'Type': 'Screen', 'Price': 100.23, 'States': [5, 1]}
    row 3: {'IsActive': True, 'Type': 'Notebook', 'Price': 50.0, 'States': [1]}
    

    Generalizing this by creating a base class with just the generic classmethod in it is not simple because of the way typing.NamedTuple is implemented.

    To avoid that issue, in Python 3.7+, a dataclasses.dataclass could be used instead because they do not have the inheritance issue — so creating a generic base class that can be reused is simple:

    #!/usr/bin/env python3.7
    import ast
    import csv
    from dataclasses import dataclass, fields
    from typing import Type, TypeVar
    
    T = TypeVar('T', bound='GenericRecord')
    
    class GenericRecord:
        """ Generic base class for transforming dataclasses. """
        @classmethod
        def _transform(cls: Type[T], dict_: dict) -> dict:
            """ Convert string values in given dictionary to corresponding type. """
            return {field.name: field.type(dict_[field.name])
                        for field in fields(cls)}
    
    
    @dataclass
    class CSV_Record(GenericRecord):
        """ Define the fields and their types in a record.
            Field names must match column names in CSV file header.
        """
        IsActive: bool
        Type: str
        Price: float
        States: ast.literal_eval  # Handles string represenation of literals.
    
    
    filename = 'test_transform.csv'
    
    with open(filename, newline='') as file:
        for i, row in enumerate(csv.DictReader(file)):
            row = CSV_Record._transform(row)
            print(f'row {i}: {row}')
    

    In one sense it's not really very important which one you use because an instance of the class in never created — using one is just a clean way of specifying and holding a definition of the field names and their type in a record data-structure.

    A TypeDict was added to the typing module in Python 3.8 that can also be used to provide the typing information, but must be used in a slightly different manner since it doesn't actually define a new type like NamedTuple and dataclasses do — so it requires having a standalone transforming function:

    #!/usr/bin/env python3.8
    import ast
    import csv
    from dataclasses import dataclass, fields
    from typing import TypedDict
    
    
    def transform(dict_, typed_dict) -> dict:
        """ Convert values in given dictionary to corresponding types in TypedDict . """
        fields = typed_dict.__annotations__
        return {name: fields[name](value) for name, value in dict_.items()}
    
    
    class CSV_Record_Types(TypedDict):
        """ Define the fields and their types in a record.
            Field names must match column names in CSV file header.
        """
        IsActive: bool
        Type: str
        Price: float
        States: ast.literal_eval
    
    
    filename = 'test_transform.csv'
    
    with open(filename, newline='') as file:
        for i, row in enumerate(csv.DictReader(file), 1):
            row = transform(row, CSV_Record_Types)
            print(f'row {i}: {row}')
    
    

提交回复
热议问题