Read data from CSV file and transform from string to correct data-type, including a list-of-integer column

前端未结

关注

 7  1510

难免孤独 2020-11-28 09:44

When I read data back in from a CSV file, every cell is interpreted as a string.

How can I automatically convert the data I read in into the correct type?

7条回答

感情败类 (楼主)

2020-11-28 10:27

I know this is a fairly old question, tagged python-2.5, but here's answer that works with Python 3.6+ which might be of interest to folks using more up-to-date versions of the language.

It leverages the built-in typing.NamedTuple class which was added in Python 3.5. What may not be evident from the documentation is that the "type" of each field can be a function.

The example usage code also uses so-called f-string literals which weren't added until Python 3.6, but their use isn't required to do the core data-type transformations.

#!/usr/bin/env python3.6
import ast
import csv
from typing import NamedTuple


class Record(NamedTuple):
    """ Define the fields and their types in a record. """
    IsActive: bool
    Type: str
    Price: float
    States: ast.literal_eval  # Handles string represenation of literals.

    @classmethod
    def _transform(cls: 'Record', dct: dict) -> dict:
        """ Convert string values in given dictionary to corresponding Record
            field type.
        """
        return {name: cls.__annotations__[name](value)
                    for name, value in dict_.items()}


filename = 'test_transform.csv'

with open(filename, newline='') as file:
    for i, row in enumerate(csv.DictReader(file)):
        row = Record._transform(row)
        print(f'row {i}: {row}')

Output:

row 0: {'IsActive': True, 'Type': 'Cellphone', 'Price': 34.0, 'States': [1, 2]}
row 1: {'IsActive': False, 'Type': 'FlatTv', 'Price': 3.5, 'States': [2]}
row 2: {'IsActive': True, 'Type': 'Screen', 'Price': 100.23, 'States': [5, 1]}
row 3: {'IsActive': True, 'Type': 'Notebook', 'Price': 50.0, 'States': [1]}

Generalizing this by creating a base class with just the generic classmethod in it is not simple because of the way typing.NamedTuple is implemented.

To avoid that issue, in Python 3.7+, a dataclasses.dataclass could be used instead because they do not have the inheritance issue — so creating a generic base class that can be reused is simple:

#!/usr/bin/env python3.7
import ast
import csv
from dataclasses import dataclass, fields
from typing import Type, TypeVar

T = TypeVar('T', bound='GenericRecord')

class GenericRecord:
    """ Generic base class for transforming dataclasses. """
    @classmethod
    def _transform(cls: Type[T], dict_: dict) -> dict:
        """ Convert string values in given dictionary to corresponding type. """
        return {field.name: field.type(dict_[field.name])
                    for field in fields(cls)}


@dataclass
class CSV_Record(GenericRecord):
    """ Define the fields and their types in a record.
        Field names must match column names in CSV file header.
    """
    IsActive: bool
    Type: str
    Price: float
    States: ast.literal_eval  # Handles string represenation of literals.


filename = 'test_transform.csv'

with open(filename, newline='') as file:
    for i, row in enumerate(csv.DictReader(file)):
        row = CSV_Record._transform(row)
        print(f'row {i}: {row}')

In one sense it's not really very important which one you use because an instance of the class in never created — using one is just a clean way of specifying and holding a definition of the field names and their type in a record data-structure.

A TypeDict was added to the typing module in Python 3.8 that can also be used to provide the typing information, but must be used in a slightly different manner since it doesn't actually define a new type like NamedTuple and dataclasses do — so it requires having a standalone transforming function:

#!/usr/bin/env python3.8
import ast
import csv
from dataclasses import dataclass, fields
from typing import TypedDict


def transform(dict_, typed_dict) -> dict:
    """ Convert values in given dictionary to corresponding types in TypedDict . """
    fields = typed_dict.__annotations__
    return {name: fields[name](value) for name, value in dict_.items()}


class CSV_Record_Types(TypedDict):
    """ Define the fields and their types in a record.
        Field names must match column names in CSV file header.
    """
    IsActive: bool
    Type: str
    Price: float
    States: ast.literal_eval


filename = 'test_transform.csv'

with open(filename, newline='') as file:
    for i, row in enumerate(csv.DictReader(file), 1):
        row = transform(row, CSV_Record_Types)
        print(f'row {i}: {row}')

0 讨论(0)

查看其它7个回答