Loading JSON data to AWS Redshift results in NULL values

前端 未结 3 1035
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-17 17:14

I am trying to perform a load/copy operation to import data from JSON files in an S3 bucket directly to Redshift. The COPY operation succeeds, and after the COPY, the table

相关标签:
3条回答
  • 2020-12-17 17:22

    So I have discovered the cause - This would not have been evident from the description I provided in my original post.

    When you create a table in Redshift, the column names are converted to lowercase. When you perform a COPY operation, the column names are case sensitive.

    The input data that I have been trying to load is using camelCase for column names, and so when I perform the COPY, the columns do not match up with the defined schema (which now uses all lowercase column names)

    The operation does not raise an error, though. It just leaves NULLs in all the columns that did not match (in this case, all of them)

    Hope this helps somebody to avoid the same confusion!

    0 讨论(0)
  • 2020-12-17 17:34

    For cases when JSON data objects don't correspond directly to column names you can use a JSONPaths file to map the JSON elements to columns as mentioned by TimZ and described here

    0 讨论(0)
  • 2020-12-17 17:38

    COPY maps the data elements in the JSON source data to the columns in the target table by matching object keys, or names, in the source name/value pairs to the names of columns in the target table. The matching is case-sensitive. Column names in Amazon Redshift tables are always lowercase, so when you use the ‘auto’ option, matching JSON field names must also be lowercase. If the JSON field name keys aren't all lowercase, you can use a JSONPaths file to explicitly map column names to JSON field name keys.

    The solution would be to use jsonpath

    Example json:

    {
    "Name": "Major",
    "Age": 19,
    "Add": {
    "street":{
    "st":"5 maint st",
    "ci":"Dub"
    },
    "city":"Dublin"
    },
    
    "Category_Name": ["MLB","GBM"]
    
    }
    

    Example table:

    (
    name varchar,
    age int,
    address varchar,
    catname varchar
    );
    

    Example jsonpath:

    {
    "jsonpaths": [
    "$['Name']",
    "$['Age']",
    "$['Add']",
    "$['Category_Name']"
    ]
    }
    

    Example copy code:

    copy customer --redshift code
    from 's3://mybucket/customer.json'
    iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
    json from 's3://mybucket/jpath.json' ; -- Jsonpath file to map fields
    

    Examples are taken from here

    0 讨论(0)
提交回复
热议问题