I have a script reading in a csv file with very huge fields:
# example from http://docs.python.org/3.3/library/csv.html?highlight=csv%20dictreader#examples
i
csv field sizes are controlled via [Python 3.Docs]: csv.field_size_limit([new_limit]):
Returns the current maximum field size allowed by the parser. If new_limit is given, this becomes the new limit.
It is set by default to 128k or 0x20000 (131072), which should be enough for any decent .csv:
>>> import csv >>> >>> limit0 = csv.field_size_limit() >>> limit0 131072 >>> "0x{0:016X}".format(limit0) '0x0000000000020000'
However, when dealing with a .csv file (with the correct quoting and delimiter) having (at least) one field longer than this size, the error pops up.
To get rid of the error, the size limit should be increased (to avoid any worries, the maximum possible value is attempted).
Behind the scenes (check [GitHub]: python/cpython - (master) cpython/Modules/_csv.c for implementation details), the variable that holds this value is a C long ([Wikipedia]: C data types), whose size varies depending on CPU architecture and OS (ILP). The classical difference: for a 64bit OS (Python build), the long type size (in bits) is:
When attempting to set it, the new value is checked to be in the long boundaries, that's why in some cases another exception pops up (this case is common on Win):
>>> import sys >>> >>> sys.platform, sys.maxsize ('win32', 9223372036854775807) >>> >>> csv.field_size_limit(sys.maxsize) Traceback (most recent call last): File "", line 1, in OverflowError: Python int too large to convert to C long
To avoid running into this problem, set the (maximum possible) limit (LONG_MAX) using an artifice (thanks to [Python 3.Docs]: ctypes - A foreign function library for Python). It should work on Python 3 and Python 2, on any CPU / OS.
>>> import ctypes as ct >>> >>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2)) 131072 >>> limit1 = csv.field_size_limit() >>> limit1 2147483647 >>> "0x{0:016X}".format(limit1) '0x000000007FFFFFFF'
64bit Python on a Nix like OS:
>>> import sys, csv, ctypes as ct >>> >>> sys.platform, sys.maxsize ('linux', 9223372036854775807) >>> >>> csv.field_size_limit() 131072 >>> >>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2)) 131072 >>> limit1 = csv.field_size_limit() >>> limit1 9223372036854775807 >>> "0x{0:016X}".format(limit1) '0x7FFFFFFFFFFFFFFF'
For 32bit Python, things are uniform: it's the behavior encountered on Win.
Check the following resources for more details on: