Read a file line by line from S3 using boto?

前端 未结 10 1024
刺人心
刺人心 2020-11-29 07:32

I have a csv file in S3 and I\'m trying to read the header line to get the size (these files are created by our users so they could be almost any size). Is there a way to do

10条回答
  •  离开以前
    2020-11-29 07:52

    The most dynamic and low cost way to read the file is to read each byte until you find the number of lines you need.

    line_count = 0
    line_data_bytes = b''
    
    while line_count < 2 :
    
        incoming = correlate_file_obj['Body'].read(1)
        if incoming == b'\n':
            line_count = line_count + 1
    
        line_data_bytes = line_data_bytes + incoming
    
    logger.debug("read bytes:")
    logger.debug(line_data_bytes)
    
    line_data = line_data_bytes.split(b'\n')
    

    You won't need to guess about header size if the header size can change, you won't end up downloading the whole file, and you don't need 3rd party tools. Granted you need to make sure the line delimeter in your file is correct and you are reading the right number of bytes to find it.

提交回复
热议问题