Lateral Flatten Snowpipe data with mixture of arrays and dict

泪湿孤枕 提交于 2020-04-18 05:27:42

问题


I have two different structured json files being piped in from a snowpipe. The only difference is that instead of a nested dict it has many nested arrays. I am trying to figure out how to transform structure 1 into one finalized table. I've successfully transformed structure 2 into a table and included the code below.

I know I need to be making use of lateral flatten but have not been successful.

**Structure 1: Nested Arrays (Need help on)**
This json lives within a table and in column **JSONTEXT**
[
  {
    "ID": "xxx-xxxx-xxxx xxx-xxx",
    "caseTypeID": "xx-xxxx-xxxx-xxxxx",
    "content": {
      "AccountID": "xx-xxxxx-xxxx-xxxx xxxx-xxxxx",
      "AccountName": "XXXX",
      "Address": {
        "pxObjClass": "Data-Address-Postal"
      },
      "Addresses": [],
      "AllKickoffsComplete": "true",
      "BillingContactList": [],
      "ClientCurrency": "USD",
      "ClientID": "XXXXXX",
      "ClientNSID": "XXXXXXXX-00",
      "ClientName": "XXXXX XXXX Inc.",
      "CompanyPhoneNumber": "XXX-XXX-XXXX",
      "CrmSearchOrg": "XXXX",
      "EEList": [
        {
          "AccountID": "xxx-xxxxx-xxxx-xxxxx xxxx-xxxxx",
          "AccountName": "XXXX",
          "AllowanceList": [
            {
              "AllowanceAmount": "327",
              "AllowanceName": "Car Allowance",
              "pxObjClass": "xxxxx-xxxxx-xxxxx"
]

Structure 2: Nested Dict This json lives within a table and in column JSONTEXT

[
  {
    "OppID": "xxxx-xxxxx",
    "pxObjClass": "xx-xxxxx-xxxx-xxxxxx",
    "pxPages": {
      "EEList": {
        "Country": "xxx",
        "CountryName": "xxx",
        "Currency": "xxx",
        "EstimatedICPCost": "xxxxxxxxxxx",
        "ICPCurrency": "xxxxx",
        "ICPID": "xxxxxxxxx.",
        "ICPNSID": "xxxx-xx",
        "ICPName": "xxx xx xx.",
        "LocalMonthlySalary": "xxxxxx",
        "MinFee": "xxxx",
        "MonthlyGrossCost": "xxxxx",
        "NewOrRepeatCustomer": "xxxxx",
        "OppCloseDate": "xxx-xxx-xx",
        "OppID": "xxx-xxxx",
        "OpportunityName": "xxx - xxx xxx - xxx - xxxx",
        "ReferralSource": "xxxxxx",
        "pxObjClass": "Index-xx-xxxx-xxxx-xxxxxx",
        "pxSubscript": "EEList"
      }
    },
    "pyID": "xxxxxx",
    "pzInsKey": "xxxx-xxxx-xxxx xxxxx-xxx"
  },
]

Here is my code for the second structure that works.

create or replace table xxxx
    as select 
    value:ID::varchar as ID,
    value:caseTypeID::varchar as caseTypeID,
    value:content:AccountID::varchar as AccountID,
    value:content:AccountName::varchar as AccountName,
    value:content:AllKickoffsComplete::boolean as AllKickoffsComplete,
    value:content:ClientCurrency::varchar as ClientCurrency,
    value:content:ClientID::varchar as ClientID,
    value:content:ClientNSID::varchar as ClientNSID,
    value:content:ClientName::varchar as ClientName,
    value:content:CompanyAddressCountryName::varchar as CompanyAddressCountryName,
    value:content:CompanyPhoneNumber::varchar as CompanyPhoneNumber,
    value:content:CreateNew::boolean as CreateNew,
    value:content:CrmSearchOrg::varchar as CrmSearchOrg,
    value:content:EEList:AccountID::varchar as EE_AccountID,
    value:content:EEList:AccountName::varchar as EE_AccountName
from new_raw_json, 
    lateral flatten (input =>jsontext);

Here is code I've tried it only works when you put jsontext[Nth].

select
    value:ID::varchar as ID,
    value:EEListID::varchar as EEListID,
    value:caseTypeID::varchar as caseTypeID
    from new_raw_json,

    lateral flatten (input => jsontext[0]:content:EEList);

Appreciate any help!


回答1:


You can chain multiple lateral views using FLATTEN to continue exploding into nested structures (arrays within arrays).

An explicitly defined approach may appear this way (only some columns are projected here, to illustrate levels):

SELECT
  outer_object.value:caseTypeID AS caseTypeID,
  outer_object.value:content.AccountID AS parentAccountID,
  eelist_object.value:AccountID AS eeListAccountID,
  allowance_object.value:AllowanceName
FROM
  new_raw_json,
  LATERAL FLATTEN (input => jsontext) outer_object,
  LATERAL FLATTEN (input => outer_object.value:content.EEList) eelist_object,
  LATERAL FLATTEN (input => eelist_object.value:AllowanceList) allowance_object;

Note that this only explodes one identified multi-value path (List -> EEList -> AllowanceList). It is unclear from the question if all the paths have to be exploded (such as List -> EEList -> Addresses AND AllowanceList) or if it is acceptable to store some of them as VARIANT (or other complex) type in the final result.

For example, if there is a need to to duplicate AllowanceList values for every listed address in Addresses under EEList, this could be achieved by performing a JOIN from two exploding query results (one that chains List -> Addresses and another that chains List -> EEList -> AllowanceList).



来源:https://stackoverflow.com/questions/60213875/lateral-flatten-snowpipe-data-with-mixture-of-arrays-and-dict

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!