When I read data back in from a CSV file, every cell is interpreted as a string.
I know this is a fairly old question, tagged python-2.5, but here's answer that works with Python 3.6+ which might be of interest to folks using more up-to-date versions of the language.
It leverages the built-in typing.NamedTuple class which was added in Python 3.5. What may not be evident from the documentation is that the "type" of each field can be a function.
The example usage code also uses so-called f-string literals which weren't added until Python 3.6, but their use isn't required to do the core data-type transformations.
#!/usr/bin/env python3.6
import ast
import csv
from typing import NamedTuple
class Record(NamedTuple):
""" Define the fields and their types in a record. """
IsActive: bool
Type: str
Price: float
States: ast.literal_eval # Handles string represenation of literals.
@classmethod
def _transform(cls: 'Record', dct: dict) -> dict:
""" Convert string values in given dictionary to corresponding Record
field type.
"""
return {name: cls.__annotations__[name](value)
for name, value in dict_.items()}
filename = 'test_transform.csv'
with open(filename, newline='') as file:
for i, row in enumerate(csv.DictReader(file)):
row = Record._transform(row)
print(f'row {i}: {row}')
Output:
row 0: {'IsActive': True, 'Type': 'Cellphone', 'Price': 34.0, 'States': [1, 2]}
row 1: {'IsActive': False, 'Type': 'FlatTv', 'Price': 3.5, 'States': [2]}
row 2: {'IsActive': True, 'Type': 'Screen', 'Price': 100.23, 'States': [5, 1]}
row 3: {'IsActive': True, 'Type': 'Notebook', 'Price': 50.0, 'States': [1]}
Generalizing this by creating a base class with just the generic classmethod in it is not simple because of the way typing.NamedTuple
is implemented.
To avoid that issue, in Python 3.7+, a dataclasses.dataclass could be used instead because they do not have the inheritance issue — so creating a generic base class that can be reused is simple:
#!/usr/bin/env python3.7
import ast
import csv
from dataclasses import dataclass, fields
from typing import Type, TypeVar
T = TypeVar('T', bound='GenericRecord')
class GenericRecord:
""" Generic base class for transforming dataclasses. """
@classmethod
def _transform(cls: Type[T], dict_: dict) -> dict:
""" Convert string values in given dictionary to corresponding type. """
return {field.name: field.type(dict_[field.name])
for field in fields(cls)}
@dataclass
class CSV_Record(GenericRecord):
""" Define the fields and their types in a record.
Field names must match column names in CSV file header.
"""
IsActive: bool
Type: str
Price: float
States: ast.literal_eval # Handles string represenation of literals.
filename = 'test_transform.csv'
with open(filename, newline='') as file:
for i, row in enumerate(csv.DictReader(file)):
row = CSV_Record._transform(row)
print(f'row {i}: {row}')
In one sense it's not really very important which one you use because an instance of the class in never created — using one is just a clean way of specifying and holding a definition of the field names and their type in a record data-structure.
A TypeDict was added to the typing
module in Python 3.8 that can also be used to provide the typing information, but must be used in a slightly different manner since it doesn't actually define a new type like NamedTuple
and dataclasses
do — so it requires having a standalone transforming function:
#!/usr/bin/env python3.8
import ast
import csv
from dataclasses import dataclass, fields
from typing import TypedDict
def transform(dict_, typed_dict) -> dict:
""" Convert values in given dictionary to corresponding types in TypedDict . """
fields = typed_dict.__annotations__
return {name: fields[name](value) for name, value in dict_.items()}
class CSV_Record_Types(TypedDict):
""" Define the fields and their types in a record.
Field names must match column names in CSV file header.
"""
IsActive: bool
Type: str
Price: float
States: ast.literal_eval
filename = 'test_transform.csv'
with open(filename, newline='') as file:
for i, row in enumerate(csv.DictReader(file), 1):
row = transform(row, CSV_Record_Types)
print(f'row {i}: {row}')