Strange results from using AWS S3 SELECT to get CSV data into SQL table

邮差的信 提交于 2019-12-11 01:32:27

问题


I have written an AWS State Machine in C# to load data from a CSV file from an S3 Bucket, into a SQL Server database table but I'm getting really odd data into the table.

The two main functions are as follows, the first gets the response payload, the second breaks this up into lines that can then be inserted.

private static async Task<ISelectObjectContentEventStream> GetSelectObjectContentEventStream(S3Object s3Object,
    AmazonS3Client s3Client, ObjectDefinition definition)
{
    var response = await s3Client.SelectObjectContentAsync(new SelectObjectContentRequest()
    {
        Bucket = s3Object.BucketName,
        Key = s3Object.Key,
        ExpressionType = ExpressionType.SQL,
        Expression = definition.Query,
        InputSerialization = new InputSerialization()
        {
            CSV = new CSVInput()
            {
                FileHeaderInfo = FileHeaderInfo.Use,
                FieldDelimiter = ",",
            }
        },
        OutputSerialization = new OutputSerialization()
        {
            CSV = new CSVOutput()
            {
                QuoteFields = QuoteFields.AsNeeded,
                FieldDelimiter = ",",
                RecordDelimiter = "\r\n"
            }
        }
    });

    return response.Payload;
}

This next part is just a section of code that takes the payload and puts it into a string list so each line can be inserted into the database

foreach (var entity in listResponse.S3Objects.Where(n => n.Key.Contains(definition.FilePrefix)))
{
    definition.FileName = entity.Key;

    if (entity.Key.Contains(definition.FileExtension))
    {
        staticDataConsumer.TargetFoundCount++;

        context.Logger.LogLine($"entity {entity.Key}");
        List<string> lines = new List<string>();

        using (var s3Events = await GetSelectObjectContentEventStream(entity, s3Client, definition))
        {
            foreach (var ev in s3Events)
            {
                //context.Logger.LogLine($"Received {ev.GetType().Name}!");
                if (ev is RecordsEvent records)
                {
                    using (var reader = new StreamReader(records.PayloadEncoding.UTF8))
                    {
                        string line;

                        while ((line = reader.ReadLine()) != null)
                        {
                            if (line.Length > 0)
                            {
                                lines.Add(line);
                            }

                            context.Logger.LogLine($"{line}");
                        }
                    }
                }
            }
        }
    }
}

When I log the extract out to a CloudWatch log or similar, the data looks correct. Here is the original format of the CSV (and I have tried different content-types, text/csv, text/plain, UTF8 etc without any change. I even tried a text file comma delimted, same issue).

Retail Store,Store Retail Business Manager
105,Kate Fardell
106,Shona Marino
108,Shona Marino
111,Sharon Berger
112,Lina Hannawe
113,Jennifer Hale
114,Paul Dalton
116,Claire Eggbeer
118,Paul Dalton
119,Shona Marino
127,Aydin Tebyanian
128,Cameron Palmer

Here is what the data looks like when logging to CloudWatch or anywhere else.

'105','Kate Fardell'
INSERT INTO StaticDataConsumer_RBMReport_csv (RowInsertDateTime,ServerName,RetailStore,RetailBusinessManager) VALUES(GETDATE(),'SDC','105','Kate Fardell') 
'106','Shona Marino'
INSERT INTO StaticDataConsumer_RBMReport_csv (RowInsertDateTime,ServerName,RetailStore,RetailBusinessManager) VALUES(GETDATE(),'SDC','106','Shona Marino') 
'108','Shona Marino'
INSERT INTO StaticDataConsumer_RBMReport_csv (RowInsertDateTime,ServerName,RetailStore,RetailBusinessManager) VALUES(GETDATE(),'SDC','108','Shona Marino') 
'111','Sharon Berger'
INSERT INTO StaticDataConsumer_RBMReport_csv (RowInsertDateTime,ServerName,RetailStore,RetailBusinessManager) VALUES(GETDATE(),'SDC','111','Sharon Berger') 
'112','Lina Hannawe'
INSERT INTO StaticDataConsumer_RBMReport_csv (RowInsertDateTime,ServerName,RetailStore,RetailBusinessManager) VALUES(GETDATE(),'SDC','112','Lina Hannawe') 
'113','Jennifer Hale'

However when I check the resulting table, the data - every character - has a space in-between it and the next one??

RowInsertDateTime       RetailStore     RetailBusinessManager
----------------------- --------------- ----------------------------------------------------------------------------------------------------
2018-11-01 11:54:38.667  1 0 5           K a t e   F a r d e l l 
2018-11-01 11:54:38.683  1 0 6           S h o n a   M a r i n o 
2018-11-01 11:54:38.687  1 0 8           S h o n a   M a r i n o 
2018-11-01 11:54:38.690  1 1 1           S h a r o n   B e r g e r 
2018-11-01 11:54:38.690  1 1 2           L i n a   H a n n a w e 
2018-11-01 11:54:38.693  1 1 3           J e n n i f e r   H a l e 
2018-11-01 11:54:38.697  1 1 4           P a u l   D a l t o n 
2018-11-01 11:54:38.700  1 1 6           C l a i r e   E g g b e e r 
2018-11-01 11:54:38.700  1 1 8           P a u l   D a l t o n 
2018-11-01 11:54:38.703  1 1 9           S h o n a   M a r i n o 
2018-11-01 11:54:38.707  1 2 7           A y d i n   T e b y a n i a n

I'm losing my mind here. What could be causing this? I've never seen it before. Interestingly, if I view the data in SQL Management Studio 'Results to Grid' the columns with the spaced out data in them show as blank? But when I view results to text, I can see the records, but they have these spaces in them? I'm losing my mind here. I've tried.

Setting different content type meta-data on the S3 object once it's in the bucket (listed the content types i've tried earlier in this post).

Setting different content types when writing the object to S3 (e.g. using PowerShell s3 write object).

Tried saving the file as a text file with the same 'content' as a csv, versus saving it as an actual csv.

No change.

Can anyone assist? there isn't much online about AWS S3 SELECT :(

来源:https://stackoverflow.com/questions/53093990/strange-results-from-using-aws-s3-select-to-get-csv-data-into-sql-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!