问题
I am loading data from S3 into Redshift using the COPY command, the gzip flag and the 'auto' format, as per this documentation on loading from S3, this documentation for using the 'auto' format in AWS, and this documentation for addressing compressed files.
My data is a highly nested JSON format, and I have created the redshift table such that the column names match exactly the highest level of the JSON structure (which allows 'auto' to work).
For instance, my JSON data look like this:
{"timestamp":{"value":"1480536125926814862"},
"Version":{"value":"0.5.0"},
"token":{"timestamp":"1480536122147094466",
"field1":"A23",
"field2":"0987adsflhja0",
"field3":"asd0-fuasjklhf"},
"camelCaseField":{"value":"asdf1234"},
"camelCaseField2":{"value":"asdfasdfasdf1234"},
"sequence":{"value":1}
}
And my table creation statment looks like this:
CREATE TABLE temp_table (
timestamp varchar(40),
Version varchar(40),
token varchar(500),
camelCaseField varchar(40),
camelCaseField2 varchar(40),
sequence varchar(10));
Then when I load the table from S3 using this command:
COPY temp_table FROM 's3://bucket-name/file_name.log.gz'
credentials '<aws-cred-args>'
json 'auto'
gzip;
It loads the data without error, but any of the fields with camelcase are empty and only timestamp, token and sequence have data in them. Is this really a case problem?
回答1:
Redshift COPY command from S3 data using the 'auto' switch is indeed case sensitive for JSON. I took one of the gzipped JSON files, switched everything to lowercase, rezipped it, nuked the table and used the same copy command and it worked fine.
There does not appear to be any way to enforce redshift to use camelCase for some column names. I used double quotes to create the columns and redshift still coerces them into lower case.
回答2:
Redshift changes all the column names to lower case names (irrespective of what case you use while defining the table). So, in your case only the fields with only lower case names are getting loaded.
来源:https://stackoverflow.com/questions/41367514/does-case-matter-when-auto-loading-data-from-s3-into-a-redshift-table