Azure Data Factory Copy Identity Column With Gaps

问题

I created a pipeline and two linked services to move data from an on-prem instance of SQL Server to an Azure Sql instance. The issue I'm running into is that we have a table "Table-1" in our on-prem with an Identity (1,1) column that is missing a sequential ID (e.g. the values are 1, 2, 3, 4, 6). When the pipeline runs, it tries to insert the rows with the ID's 1, 2, 3, 4, 5 which is a big problem because ID 6 is a foreign key on another table "Table-2" and now it doesn't exist, so the movement of data to Table-2 fails with SQL Error 547 (Insert statement conflicted with the foreign key constraint...).

The right thing to do in my mind would be to make that column not an identity, but that is not an option for me right now as the app code which creates records expects that column to be auto-generated.

Is there a way around this other than not using Data Factory? I would like to see it automatically toggle identity_insert on and off for tables with an Identity column and while I know this would force these tables to be processed one at a time, the option would be nice and it wouldn't destroy my relationships.

Edit: Per wBob's suggestion, I also added a feature request (if you care to vote on it) here: https://feedback.azure.com/forums/270578-data-factory/suggestions/17996950-add-support-for-maintaining-identity-column-values

回答1:

Azure Data Factory does not natively support switching the identity property of tables on or off, but two workarounds spring to mind.

Use Data Factory to load the data into a staging table (where identity property is not set) then use a Stored Proc task to call a stored procedure where you have much tighter control, including the ability to set the identity property on or off.
If you are using Azure SQL Database (or SQL Server on a VM), you could use table-valued parameters and pass your data into the stored proc task that way, skipping the staging table. This technique does not work with Azure SQL Data Warehouse. I probably would not recommend this for high volume. This example shows how:

https://github.com/Microsoft/azure-docs/blob/master/includes/data-factory-sql-invoke-stored-procedure.md

I have not been able to test these but believe they would work. Let me know if you have any issues.

回答2:

I accepted wBob's answer but wanted to put a little more detail into what I did.

I had probably 100 tables to move over with all sorts of dependencies and identities. So here are the steps I carried out to get the data into azure:

Create a pipeline to move over all tables with no identity and no dependencies, found by querying sys.tables:
```
select *
from sys.tables t
where not exists (
    select *
    from sys.columns c 
    where c.object_id = t.object_id
    and is_identity = 1
)
```
and bumped the results here up against the results of sp_msdependencies where the oType = 8. I then took all of the tables in this result set where oSequence = 1 (no dependencies) and put those tables in the pipeline and ran it.
I created a Staging schema and re-created all of the tables with an identity column (found by removing the 'not' in the query in (1.), and there were over 60 of them) and removed the identity specification when creating them.
I then created another data factory pipeline to move the data into these Staging tables.
Ran a bunch of 'insert into...' statements to move the data from the staging tables into their identity-laden counterparts, setting identity_insert on and off each time. NOTE: Here, I also had to be mindful of the sp_msdependencies result so as not to get foreign errors
Created a data factory pipeline to move the remaining tables over.

Whew...

来源：https://stackoverflow.com/questions/42077953/azure-data-factory-copy-identity-column-with-gaps

标签

sql-server

azure-sql-database

azure-data-factory