问题
So we have an multiple SSIS packages which are run parallel to perform incremental loads to the multiple cubes. This package runs every night. Occasionally we run into network issues. Now if SSIS looses the connection for even a moment, it fails all the packages.
It does not automatically try to re connect even if the connection become active and is available again. It continues to execute the packages using the same connection info that it acquired during the start of execution. Now due to this all remaining packages are failed. In order to connect again, it is allowed to finish execution and then someone has to manually re process the packages in order to acquire new connection info.
So my question is, can i implement a system in which if SSIS looses network connection, it tries to automatically check if the connection becomes available during processing and try to reconnect rather than failing all packages?
回答1:
An approach for package dependency and restartability, by @billinkc
I favor a very lean approach for execution frameworks - it's a two table system but it's enough. At the beginning of each package, an Execute SQL Task fires that starts a log record. When the SSIS package completes successfully, the final step is to close the audit row for the process - another Execute SQL Task.
By the absence of a close time and the fact that SSIS isn't running right now, I know the package ended abnormally. Maybe it threw a foreign key violation, maybe the network dropped. I don't really care for this table. The only function is Done/Not Done. If it's Not Done, then when we restart it's a candidate for processing.
Candidate for processing you say? Yes, the worker packages are dumb - they only do what I outlined above. It is the responsibility of the orchestrator/parent/master package to worry about what packages need to run and in what order. If a child package starts running, it's because it was told to. It doesn't "know" whether it's already done it's work or not.
The package execution history is all done in one single table. It is not a replacement for the sysdtslog90/sysssislog/catalog.operation_messages tables(s) that SSIS will log to. Those tables, depending on your version, will hold the actual details about why the package started but didn't complete.
For the master to know what hasn't run, it needs to know three things
- what is the refresh interval
- what processes should run
- what has already run successfully.
Refresh interval - your business requirements and/or data availability help dictate this but I assume it needs to run once a day. Devil's in the detail though, how do you define that day - since midnight, or in the past 24 hours, or some other rule. Whatever it is, you need to know how long before a partial retry turns into an abort and do a full run.
What needs to run - At a minimum, this a list of all the child packages the parent will call. What I prefer to do sounds similar to your approach, I run lots of packages in parallel but a serial execution within that parallel block. I get good parallelization in there without swamping the boxes.
What has run successfully - you already know the answer! We look at our package execution history for any packages that started to run but don't have an end date on them. Any of those records are what need to be restarted (less the fiddling required for refresh interval).
That's the high level concepts, hopefully wasn't tl;dr
Implementation
Pick a database, any database to hold this information. Some places I create a dedicated SSISAudit database, other times I dovetail it in with other metadata repositories.
This will create 3 objects in your database (not included are insert/update/delete methods for the tables). The two tables and a stored procedure that determines what packages need to be run. Feed the procedure the master package name the container you need to get work for and out comes a recordset of packages requiring work (aka have not run successfully since midnight).
-- This is my package execution history table
CREATE TABLE
dbo.PackageHistory
(
PackageName sysname NOT NULL
, PackageID uniqueidentifier NOT NULL
, ParentPackageID uniqueidentifier NULL
, ExecutionID bigint NULL
, StartTime datetime NOT NULL
, StopTime datetime NULL
, Duration_s AS (DATEDIFF(SECOND, StartTime, StopTime))
);
CREATE TABLE dbo.PackageDependency
(
MasterPackageName sysname NOT NULL
, ContainerName sysname NOT NULL
, ChildPackageName sysname NOT NULL
, ExecutionOrderSequence int NOT NULL
, CONSTRAINT PK_PackageDependency PRIMARY KEY
CLUSTERED
(
MasterPackageName ASC
, ContainerName ASC
, ChildPackageName ASC
)
);
GO
-----------------------------------------------------------------------------
-- Function: AUD.PackageProcesssingControlListGet
-- Author: Bill Fellows
-- Date: 2016-01-18
--
-- This procedure will return a list of outstanding pacakges that need to
-- be processed for a given container. Assumes that processing is done daily
--
-- Recordsets:
-- 0 to N rows will be returned to the caller
--
-- Side-effects:
-- None
--
-- See also:
--
-- Modified:
--
-----------------------------------------------------------------------------
CREATE PROCEDURE dbo.PackageProcesssingControlListGet
(
@MasterPackageName sysname
, @ContainerName sysname
)
AS
BEGIN
SET NOCOUNT ON;
SELECT
PD.ChildPackageName
FROM
dbo.PackageDependency AS PD
LEFT OUTER JOIN
-- Find packages that have not completed
AUD.PackageControl AS PC
ON PC.PackageName = PD.ChildPackageName
-- Since midnight
AND PC.StartTime > CAST(CURRENT_TIMESTAMP AS date)
-- StopTime is only populated when package completes successfully
AND PC.StopTime IS NOT NULL
WHERE
PD.MasterPackageName = @MasterPackageName
AND PD.ContainerName = @ContainerName
AND PC.PackageName IS NULL
ORDER BY
PD.ExecutionOrderSequence ASC;
END
GO
Now what? Let's gin up a simple example
Here's what the master package looks like. Forgive the lack of annotations on it, I don't have Snagit installed and I am le suck at mspaint
SQL Begin Audit
This is an insert into PackageHistory table. I specify system variables across the board, with no value specified for ParentPackageID or for StopTime. ExecutionID is going to be the value coming from System::ServerExecutionID
I think the rest are self evident.
SEQC Do Work X
This is a sequence container. Make as many of these as you want to have things running in parallel. But, only copy and paste after you have the first one complete to make your life easier.
Notice I have 3 variables defined and they are scoped to the Sequence Container. By default in 2012+, variables are created at the package level scope. In 2005/2008, variables were created where you had mouse focus. For 2012+, create variables as normal and then click the second icon in the Variable list which is "Move Variable". Move those three to the Sequence Container.
This is going to allow us to use scoping rules to keep variables hidden from other containers. And be lazy - I'm a fan of lazy
3 variables.
- ContainerName - This is the value you will store in the column
ContainerName
on the PackageDependency table. - CurrentPackageName - this is the name of a valid SSIS package in your current project. It won't matter as this gets overwritten but do have the .dtsx extension
- rsWorkList - this a variable of type Object. It's going to hold the results of our stored procedure call.
SQL Get Work List
We need to get a list of all the packages this container should run. Remember that stored procedure we created, let's use it. The SQL statement is and you need to specify the Execute SQL Task will return a Full Result set.
EXECUTE dbo.PackageProcesssingControlListGet ?, ?;
In the Result Set tab, map the name of 0 to User::rwWorkList
FELC Shred Work List
This is a standard shredding of a recordset. We got our variable populated in the previous step so let's enumerate through the results. There will be one column in our result set and you will map that to User::CurrentPackageName
EPT Run Package
The final step in the sequence is to actually run the package we just popped off the work list so an Execute Package Task that has an expression for PackageName based the variable @[User::CurrentPackageName].
Furthermore, I'm going to use the Parameter Bindings tab (2012+) to pass in the current System::ExecutionID to the child package as the parameter ParentExecutionID.
SQL End Audit
If we reach this task, we update the row the PackageHistory where the ExecutionID and PackageIDs match and the StopTime IS NULL.
Child package
This is actually just the parent package except instead of having 0 to N sequence containers in the middle, we just do whatever the child package does. I do have a package level parameter to accept the parent package id but you don't have to do so.
Seem reasonable? If so, then all you have is some bookkeeping to do on the dbo.PackageDependency table.
INSERT INTO
dbo.PackageDependency
( MasterPackageName
, ContainerName
, ChildPackageName
, ExecutionOrderSequence
)
SELECT
*
FROM
(
VALUES
(
'so_34866238.dtsx'
, 'List0'
, 'C1.dtsx'
, 10
)
,
(
'so_34866238.dtsx'
, 'List0'
, 'C2.dtsx'
, 20
)
,
(
'so_34866238.dtsx'
, 'List1'
, 'D1.dtsx'
, 10
)
) D (MasterPackageName, ContainerName, ChildPackageName, ExecutionOrderSequence);
GO
-- Test we get expected results
-- 2 rows
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List0';
-- 1 row
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List1';
-- NULL rows
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List2';
Finally, while this isn't bad, it's somewhat tedious. If I were to do this, I'd automate the daylights out of it with some Biml. Too late tonight for me to write that up but I'll get it done before I'm too distracted.
来源:https://stackoverflow.com/questions/34866238/ssis-internet-connectivity-issue