ssis best practice to load N tables from Source to Target Server

后端 未结 3 469
梦谈多话
梦谈多话 2020-12-03 16:19

I need to load N (around 50) tables from a Source DB to a Target one. Each table is different from the others (so different metadata); I thought I could use a parent pkg to

相关标签:
3条回答
  • 2020-12-03 17:01

    there is a lot of factors which have impact on the what scenario to choose.

    But in general:

    For small tables with relatively few rows, you can put multiple sources/destinations in single data flow

    If you have complex ETL, for the source/destination, than it is better to put them to separate data flow tasks for clarity

    If you need to define the sequence of execution you have to use multiple data flow tasks, as you cannot control the order of execution for multiple sources/destinations in single data flow tasks.

    Whenever you need different transactional isolation level or behavior, you have to put them into separate data flows.

    Whenever you are unsure on the impact of the ETL on the source system put them in separate data flows as it will allow you to optimize the execution order in the future more easily.

    If you have large tables than put them into separate data flow tasks, as this will allow to optimize buffer sizes for different tables and optimize the ETL process for whatever reason

    So from the above if you have relatively small tables, and straight source/destination mapping, than there is no problem to have more source/destinations in single data flow.

    In other cases it is better or necessary to put them into separate data flows as it will allow you optimize the ETL process from all three points of view:

    Load impact on the Source systems

    Load impact on the destination systems

    Utilization of the machine on which the ETL process is running (CPU consumption, memory consumption and overall though output).

    0 讨论(0)
  • 2020-12-03 17:08

    You could use an Execute SQL Task in SSIS to copy using this or execute directly from SQL Agent. This assumes that the schema already exists on the destination db.

    DECLARE @tblname as varchar(50);
    DECLARE @tblschema as varchar(50);
    DECLARE @srcdb as varchar(12);
    DECLARE @destdb as varchar(12);
    DECLARE @tablesql as nvarchar(255);
    
    SET @srcdb = 'MYSRCDB';
    SET @destdb = 'MYDESTDB';
    
    DECLARE tbl_cursor CURSOR FOR
    SELECT table_schema, table_name FROM information_schema.tables where table_type = 'BASE TABLE'
    
    OPEN tbl_cursor
    
    FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
    WHILE @@FETCH_STATUS = 0
    BEGIN
    
    SET @tablesql = N'SELECT * into '+@destdb+'.'+@tblschema+'.'+@tblname+' FROM '+@srcdb+'.'+@tblschema+'.'+@tblname;
    EXECUTE sp_executesql @tablesql
    
    FETCH NEXT FROM tbl_cursor INTO @tblschema, @tblname
    END
    
    CLOSE tbl_cursor
    DEALLOCATE tbl_cursor
    
    0 讨论(0)
  • 2020-12-03 17:10

    I've become a fan of using biml to solve these sorts of problems.

    1. Download and install BIDS Helper
    2. Add a new biml file to an existing SSIS project
    3. Disable Visual Studio's auto-fix for XML files. See Overcoming BimlScript Copy and Paste Issues in Visual Studio
    4. Paste this code into that Bimlscript.biml file
    5. Fix your source and destination connection strings (lines 8 & 9) to point to the correct servers as well as change the Provider type if SQLNCLI11.1 is not correct for your version of SQL Server
    6. Right-Click on the biml file and select "Generate SSIS Packages"

    Assuming everything is lined up, you'd end up with 29 packages that have a single data flow task in them pulling from source to destination (based on an SSIS Variable).

    <#@ template language="C#" hostspecific="true" #>
    <#@ import namespace="System.Data" #>
    <#@ import namespace="System.Data.SqlClient" #>
    <#@ import namespace="System.IO" #>
    <Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <!--
    <#
        string connectionStringSource = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012;Integrated Security=SSPI;Provider=SQLNCLI11.1";
        string connectionStringDestination = @"Server=localhost\dev2012;Initial Catalog=AdventureWorksDW2012_DEST;Integrated Security=SSPI;Provider=SQLNCLI11.1";
    
        string SrcTableQuery =     @"
    SELECT
        SCHEMA_NAME(t.schema_id) AS schemaName
    ,   T.name AS tableName
    FROM
        sys.tables AS T
    WHERE
        T.is_ms_shipped = 0
        AND T.name <> 'sysdiagrams';
    ";
    
        DataTable dt = null;
        dt = ExternalDataAccess.GetDataTable(connectionStringSource, SrcTableQuery);
    #>    
    -->
        <Connections>
            <OleDbConnection
                Name="SRC"
                CreateInProject="false"
                ConnectionString="<#=connectionStringSource#>"
                RetainSameConnection="false">
            </OleDbConnection>
            <OleDbConnection
                Name="DST"
                CreateInProject="false"
                ConnectionString="<#=connectionStringDestination#>"
                RetainSameConnection="false">
            </OleDbConnection>
        </Connections>
    
        <Packages>
            <# foreach (DataRow dr in dt.Rows) { #>
                <Package ConstraintMode="Linear"
                    Name="<#=dr[1].ToString()#>"
    
                >
                <Variables>
                    <Variable Name="SchemaName" DataType="String"><#=dr[0].ToString()#></Variable>
                    <Variable Name="TableName" DataType="String"><#=dr[1].ToString()#></Variable>
                    <Variable Name="QualifiedTableSchema"
                              DataType="String"
                              EvaluateAsExpression="true">"[" +  @[User::SchemaName] + "].[" +  @[User::TableName] + "]"</Variable>
                </Variables>
                <Tasks>
                    <Dataflow
                        Name="DFT"
                    >
                        <Transformations>
                            <OleDbSource
                                Name="OLE_SRC <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
                                ConnectionName="SRC"
                            >
                                <TableFromVariableInput VariableName="User.QualifiedTableSchema"/>
                            </OleDbSource>
                            <OleDbDestination
                                Name="OLE_DST <#=dr[0].ToString()#>_<#=dr[1].ToString()#>"
                                ConnectionName="DST"
                                KeepIdentity="true"
                                TableLock="true"
                                UseFastLoadIfAvailable="true"
                                KeepNulls="true"
                                >
                                <TableFromVariableOutput VariableName="User.QualifiedTableSchema" />                        
                            </OleDbDestination>
                        </Transformations>
                    </Dataflow>
    
                </Tasks>
                </Package>
            <# } #>
        </Packages>
    </Biml>
    

    At this point, you simply need to figure out how you want to coordinate the execution of the packages.

    0 讨论(0)
提交回复
热议问题