I have a csv file in S3 and I\'m trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do
The most dynamic and low cost way to read the file is to read each byte until you find the number of lines you need.
line_count = 0
line_data_bytes = b''
while line_count < 2 :
incoming = correlate_file_obj['Body'].read(1)
if incoming == b'\n':
line_count = line_count + 1
line_data_bytes = line_data_bytes + incoming
logger.debug("read bytes:")
logger.debug(line_data_bytes)
line_data = line_data_bytes.split(b'\n')
You won't need to guess about header size if the header size can change, you won't end up downloading the whole file, and you don't need 3rd party tools. Granted you need to make sure the line delimeter in your file is correct and you are reading the right number of bytes to find it.