python Bytes IO usage on AWS Lambda and second iteration of csv file

问题

I have a csv file named myref.csv and a list as ref_list:

myref.csv

name,Dept,City
sree,NULL,Bengaluru
vatsasa,,Hyd
,,VJA
capgemini,,TPTY
DTP,,
Bengaluru,NULL,TVM
sre,NULL,MNGL
vatsas,,Kochi
,NULL,TVM
capgemin,NULL,MNGL
DTP9,NULL,Kochi
NULL,NULL,TVM
sree0,NULL,MNGL

ref_list:

ref_list=['Name', 'Dept', 'City', 'Address']

I have following code written:

response = s3.get_object(Bucket=src_bucket, Key=key)
lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)
first_row = next(reader)
readers=list(reader)
    ...............
    *#done something here using **'readers'**, which is out of scope for this post*
    ...............
csv_buffer = BytesIO()   #Problem has started here..
writer=csv.writer(csv_buffer)
content=csv_buffer.getvalue()
inser_null_at=[]
for i, header in enumerate(ref_list):
    if header not in first_row:
        inser_null_at.append(i)
    writer.writerow(ref_list)
for row in readers:
    for i in inser_null_at:
        row.insert(i, "")
    writer.writerow([item if item != "" else "NULL" for item in row])

I have read myref.csv file through csv.reader and assigned to variable 'reader'. I had written few commands to use 'reader'. But those commands are out of scope for this post.

So as csvfile is iterated once and cannot use the same for second iteration, list format of reader is assigned to another variable named 'readers'.

I would like to check all the columns of list ref_list present in myref.csv.

1. If present, check whether any one of the columns in myref.csv contains spaces. If spaces are present, replace spaces with value NULL and write all the columns to a new csv file, csvfile3, column-wise.

2. If not present, write those missing columns along with existing ones to csvfile3. In addition, values of those missing columns should be shown as NULL in csvfile3, and spaces under existing columns should be replaced with NULL

As i have written on AWS lambda and may be, we need to process huge CSV files, i have created BytesIO() for variable csv_buffer and trying to write the output to that variable. At the end, file will be copied from buffer to an object in a AWS S3 bucket, which is yet to be written.

But the statements starting from csv_buffer = BytesIO() are not executed. So trying to know, where the mistake is done.

Expected Output:

name,Dept,City,Address
sree,NULL,Bengaluru,NULL
vatsasa,NULL,Hyd,NULL
NULL,NULL,VJA,NULL
capgemini,NULL,TPTY,NULL
DTP,NULL,NULL,NULL
Bengaluru,NULL,TVM,NULL
sre,NULL,MNGL,NULL
vatsas,NULL,Kochi,NULL
NULL,NULL,TVM,NULL
capgemin,NULL,MNGL,NULL 
DTP9,NULL,Kochi,NULL
NULL,NULL,TVM,NULL
sree0,NULL,MNGL,NULL

Actual Output: None

Note: This has been written on AWS lambda

来源：https://stackoverflow.com/questions/55036134/python-bytes-io-usage-on-aws-lambda-and-second-iteration-of-csv-file

标签

python

amazon-web-services

csv

aws-lambda