SSIS internet connectivity issue

无人久伴 提交于 2019-12-13 20:00:26

问题


So we have an multiple SSIS packages which are run parallel to perform incremental loads to the multiple cubes. This package runs every night. Occasionally we run into network issues. Now if SSIS looses the connection for even a moment, it fails all the packages.

It does not automatically try to re connect even if the connection become active and is available again. It continues to execute the packages using the same connection info that it acquired during the start of execution. Now due to this all remaining packages are failed. In order to connect again, it is allowed to finish execution and then someone has to manually re process the packages in order to acquire new connection info.

So my question is, can i implement a system in which if SSIS looses network connection, it tries to automatically check if the connection becomes available during processing and try to reconnect rather than failing all packages?


回答1:


An approach for package dependency and restartability, by @billinkc

I favor a very lean approach for execution frameworks - it's a two table system but it's enough. At the beginning of each package, an Execute SQL Task fires that starts a log record. When the SSIS package completes successfully, the final step is to close the audit row for the process - another Execute SQL Task.

By the absence of a close time and the fact that SSIS isn't running right now, I know the package ended abnormally. Maybe it threw a foreign key violation, maybe the network dropped. I don't really care for this table. The only function is Done/Not Done. If it's Not Done, then when we restart it's a candidate for processing.

Candidate for processing you say? Yes, the worker packages are dumb - they only do what I outlined above. It is the responsibility of the orchestrator/parent/master package to worry about what packages need to run and in what order. If a child package starts running, it's because it was told to. It doesn't "know" whether it's already done it's work or not.

The package execution history is all done in one single table. It is not a replacement for the sysdtslog90/sysssislog/catalog.operation_messages tables(s) that SSIS will log to. Those tables, depending on your version, will hold the actual details about why the package started but didn't complete.

For the master to know what hasn't run, it needs to know three things

  • what is the refresh interval
  • what processes should run
  • what has already run successfully.

Refresh interval - your business requirements and/or data availability help dictate this but I assume it needs to run once a day. Devil's in the detail though, how do you define that day - since midnight, or in the past 24 hours, or some other rule. Whatever it is, you need to know how long before a partial retry turns into an abort and do a full run.

What needs to run - At a minimum, this a list of all the child packages the parent will call. What I prefer to do sounds similar to your approach, I run lots of packages in parallel but a serial execution within that parallel block. I get good parallelization in there without swamping the boxes.

What has run successfully - you already know the answer! We look at our package execution history for any packages that started to run but don't have an end date on them. Any of those records are what need to be restarted (less the fiddling required for refresh interval).

That's the high level concepts, hopefully wasn't tl;dr

Implementation

Pick a database, any database to hold this information. Some places I create a dedicated SSISAudit database, other times I dovetail it in with other metadata repositories.

This will create 3 objects in your database (not included are insert/update/delete methods for the tables). The two tables and a stored procedure that determines what packages need to be run. Feed the procedure the master package name the container you need to get work for and out comes a recordset of packages requiring work (aka have not run successfully since midnight).

-- This is my package execution history table
CREATE TABLE 
    dbo.PackageHistory
(
    PackageName sysname NOT NULL
,   PackageID uniqueidentifier NOT NULL
,   ParentPackageID uniqueidentifier NULL
,   ExecutionID bigint NULL
,   StartTime datetime NOT NULL
,   StopTime datetime NULL
,   Duration_s AS (DATEDIFF(SECOND, StartTime, StopTime))
);

CREATE TABLE dbo.PackageDependency
(
    MasterPackageName sysname NOT NULL
,   ContainerName sysname NOT NULL
,   ChildPackageName sysname NOT NULL
,   ExecutionOrderSequence int NOT NULL
,   CONSTRAINT PK_PackageDependency PRIMARY KEY 
    CLUSTERED
    (
        MasterPackageName ASC
    ,   ContainerName ASC
    ,   ChildPackageName ASC
    )
);

GO
-----------------------------------------------------------------------------
-- Function: AUD.PackageProcesssingControlListGet
-- Author: Bill Fellows
-- Date: 2016-01-18
--
-- This procedure will return a list of outstanding pacakges that need to 
-- be processed for a given container. Assumes that processing is done daily
--
-- Recordsets:
-- 0 to N rows will be returned to the caller
--
-- Side-effects:
-- None
--
-- See also:
--
-- Modified:
--
-----------------------------------------------------------------------------
CREATE PROCEDURE dbo.PackageProcesssingControlListGet
(
    @MasterPackageName sysname
,   @ContainerName sysname
)
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        PD.ChildPackageName
    FROM
        dbo.PackageDependency AS PD
        LEFT OUTER JOIN
            -- Find packages that have not completed
            AUD.PackageControl AS PC
            ON PC.PackageName = PD.ChildPackageName
            -- Since midnight
            AND PC.StartTime > CAST(CURRENT_TIMESTAMP AS date)
            -- StopTime is only populated when package completes successfully
            AND PC.StopTime IS NOT NULL
    WHERE
        PD.MasterPackageName = @MasterPackageName
        AND PD.ContainerName = @ContainerName 
        AND PC.PackageName IS NULL
    ORDER BY 
        PD.ExecutionOrderSequence ASC;
END
GO

Now what? Let's gin up a simple example

Here's what the master package looks like. Forgive the lack of annotations on it, I don't have Snagit installed and I am le suck at mspaint

SQL Begin Audit

This is an insert into PackageHistory table. I specify system variables across the board, with no value specified for ParentPackageID or for StopTime. ExecutionID is going to be the value coming from System::ServerExecutionID I think the rest are self evident.

SEQC Do Work X

This is a sequence container. Make as many of these as you want to have things running in parallel. But, only copy and paste after you have the first one complete to make your life easier.

Notice I have 3 variables defined and they are scoped to the Sequence Container. By default in 2012+, variables are created at the package level scope. In 2005/2008, variables were created where you had mouse focus. For 2012+, create variables as normal and then click the second icon in the Variable list which is "Move Variable". Move those three to the Sequence Container.

This is going to allow us to use scoping rules to keep variables hidden from other containers. And be lazy - I'm a fan of lazy

3 variables.

  • ContainerName - This is the value you will store in the column ContainerName on the PackageDependency table.
  • CurrentPackageName - this is the name of a valid SSIS package in your current project. It won't matter as this gets overwritten but do have the .dtsx extension
  • rsWorkList - this a variable of type Object. It's going to hold the results of our stored procedure call.

SQL Get Work List

We need to get a list of all the packages this container should run. Remember that stored procedure we created, let's use it. The SQL statement is and you need to specify the Execute SQL Task will return a Full Result set.

EXECUTE dbo.PackageProcesssingControlListGet ?, ?;

In the Result Set tab, map the name of 0 to User::rwWorkList

FELC Shred Work List

This is a standard shredding of a recordset. We got our variable populated in the previous step so let's enumerate through the results. There will be one column in our result set and you will map that to User::CurrentPackageName

EPT Run Package

The final step in the sequence is to actually run the package we just popped off the work list so an Execute Package Task that has an expression for PackageName based the variable @[User::CurrentPackageName].

Furthermore, I'm going to use the Parameter Bindings tab (2012+) to pass in the current System::ExecutionID to the child package as the parameter ParentExecutionID.

SQL End Audit

If we reach this task, we update the row the PackageHistory where the ExecutionID and PackageIDs match and the StopTime IS NULL.

Child package

This is actually just the parent package except instead of having 0 to N sequence containers in the middle, we just do whatever the child package does. I do have a package level parameter to accept the parent package id but you don't have to do so.

Seem reasonable? If so, then all you have is some bookkeeping to do on the dbo.PackageDependency table.

INSERT INTO
    dbo.PackageDependency
(   MasterPackageName
,   ContainerName
,   ChildPackageName
,   ExecutionOrderSequence
)
SELECT
*
FROM
(
    VALUES
    (
        'so_34866238.dtsx'
    ,   'List0'
    ,   'C1.dtsx'
    ,   10
    )
    ,
    (
        'so_34866238.dtsx'
    ,   'List0'
    ,   'C2.dtsx'
    ,   20
    )
    ,
    (
        'so_34866238.dtsx'
    ,   'List1'
    ,   'D1.dtsx'
    ,   10 
    )
) D (MasterPackageName, ContainerName, ChildPackageName, ExecutionOrderSequence);

GO
-- Test we get expected results
-- 2 rows
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List0';
-- 1 row
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List1';
-- NULL rows
EXECUTE dbo.PackageProcesssingControlListGet 'so_34866238.dtsx', 'List2';

Finally, while this isn't bad, it's somewhat tedious. If I were to do this, I'd automate the daylights out of it with some Biml. Too late tonight for me to write that up but I'll get it done before I'm too distracted.



来源:https://stackoverflow.com/questions/34866238/ssis-internet-connectivity-issue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!