Azure Data Flow creating / managing keys for identity relationships

微笑、不失礼 提交于 2021-01-29 02:04:16

问题


Curious to find out what the best way is to generate relationship identities through ADF.

Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.

This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.

Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.

Also note that I turn IDENTITY_INSERT ON/OFF for each sink.


回答1:


I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:

-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO

DROP SCHEMA IF EXISTS import 
GO

CREATE SCHEMA import
    CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
    CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
    CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO


DECLARE @json NVARCHAR(MAX) = '{
   "Cities": [
      {
         "Country": "England",
         "Region": "Greater London",
         "City": "London"
      },
      {
         "Country": "England",
         "Region": "West Midlands",
         "City": "Birmingham"
      },
      {
         "Country": "England",
         "Region": "Greater Manchester",
         "City": "Manchester"
      },
      {
         "Country": "Scotland",
         "Region": "Lothian",
         "City": "Edinburgh"
      }
   ]
}'


SELECT *
INTO #tmp
FROM OPENJSON( @json, '$.Cities' )
WITH
(
    Country     VARCHAR(50),
    Region      VARCHAR(50),
    City        VARCHAR(50)
)
GO


-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )


-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
    INNER JOIN import.Country t ON s.Country = t.CountryName


-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
    INNER JOIN import.Country c ON s.Country = c.CountryName
    INNER JOIN import.Region r ON s.Region = r.RegionName
        AND c.CountryKey = r.CountryKey


SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;

This is a simple test script designed to show the idea and should run end-to-end but it is not production code.



来源:https://stackoverflow.com/questions/62457704/azure-data-flow-creating-managing-keys-for-identity-relationships

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!