etl | 易学教程

Is it possible to pass parameters to a .dtsx package on the command line?

阅读更多关于 Is it possible to pass parameters to a .dtsx package on the command line?

问题 I am currently executing an SSIS package (package.dtsx) from the command line using Dtexec . This is as simple as: dtexec /f Package.dtsx However, I have some parameters that I would like to pass to the package for it to use during execution. The documentation implies that this might be possible (i.e. the /Par parameter), but it is not clear. Is it possible to pass parameters to a .DTSX file using dtexec ? 回答1: Of course yes, you can assign values to variables using dtexec Syntax dtexec /f

OrientDB ETL Edge transformer 2 joinFieldName(s)

阅读更多关于 OrientDB ETL Edge transformer 2 joinFieldName(s)

with one joinFieldName and lookup the Edge transformer works perfect. However, now two keys is required, i.e. compound index in the lookup. How can two joinFieldNames be specified? This is the scripted(post processing) version: Create edge Expands from (select from MC where sample=1 and mkey=6) to (select from Event where sample=1 and mcl=6) . This works, but is not suitable for production. Can anyone help? you can simply add 2 joinFieldName(s) like { "edge": { "class": "Conn", "joinFieldName": "b1", "lookup": "A.a1", "joinFieldName": "b2", "lookup": "A.a2", "direction": "out" }} see below my

SSIS: Flat File default length

阅读更多关于 SSIS: Flat File default length

问题 I have to import about 50 different types of files every day. Some of them with a few columns, some inculde up to 250 columns. The Flat File connection always defaults all columns to 50 chars. Some columns can be way longer than 50 chars, and will of course end up in errors. Currently i am doing a stupid search&replace with notepad++ - Opening all SISS packages, replacing: DTS:MaximumWidth="50" by DTS:MaximumWidth="500" This is an annoying workaround. Is there any possibility to set a default

Importing yyyyMMdd Dates From CSV in SSIS

阅读更多关于 Importing yyyyMMdd Dates From CSV in SSIS

I have 12 columns using the yyyymmdd format. In the Data Flow Task , I have a Flat File Source , a Derived Column Task and an OLE DB Destination . I'm applying the following expression to these fields in the Derived Column Task : (DT_DBDATE)(SUBSTRING((DT_STR,10,1252)([Date_Column]),1,4) + "-" + SUBSTRING((DT_STR,10,1252)([Date_Column]),5,2) + "-" + SUBSTRING((DT_STR,10,1252)([Date_Column]),7,2)) It keeps making me convert the field before I substring it, but I have the fields set up as DT_STR in the Connection Manager . The destination field is in DATE format in SQL Server. SSIS always shows

Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

阅读更多关于 Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

问题 I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl 回答1: For a start, it depends on whether you want to do

ETL讲解

阅读更多关于 ETL讲解

ETL讲解 ETL讲解（很详细！！！） ETL是将业务系统的数据经过抽取、清洗转换之后加载到数据仓库的过程，目的是将企业中的分散、零乱、标准不统一的数据整合到一起，为企业的决策提供分析依据。 ETL是BI项目重要的一个环节。通常情况下，在BI项目中ETL会花掉整个项目至少1/3的时间,ETL设计的好坏直接关接到BI项目的成败。　　ETL的设计分三部分：数据抽取、数据的清洗转换、数据的加载。在设计ETL的时候我们也是从这三部分出发。数据的抽取是从各个不同的数据源抽取到ODS(Operational Data Store，操作型数据存储)中——这个过程也可以做一些数据的清洗和转换)，在抽取的过程中需要挑选不同的抽取方法，尽可能的提高ETL的运行效率。ETL三个部分中，花费时间最长的是“T”(Transform，清洗、转换)的部分，一般情况下这部分工作量是整个ETL的2/3。数据的加载一般在数据清洗完了之后直接写入DW(Data Warehousing，数据仓库)中去。　　ETL的实现有多种方法，常用的有三种。一种是借助ETL工具(如Oracle的OWB、SQL Server 2000的DTS、SQL Server2005的SSIS服务、Informatic等)实现，一种是SQL方式实现，另外一种是ETL工具和SQL相结合。前两种方法各有各的优缺点

Pentaho-kettle: Need to create ETL Jobs dynamically based on user input

阅读更多关于 Pentaho-kettle: Need to create ETL Jobs dynamically based on user input

问题 In my application, user can specify the format of their file. Based on user input we dynamically create SSIS package. http://lakshmik.blogspot.com/2005/05...eate-ssis.html Dynamically created SSIS package is used for processing user's files. We want to evaluate Pentaho-Kettle for this requirement. Is this possible with Kettle to dynamically create ETL jobs based on user's inputs? If not Pentaho, is there any Java ETL tool which allows use to dynamically create ETL jobs? 回答1: I dont know about

Oracle Data Integrator 介绍

阅读更多关于 Oracle Data Integrator 介绍

本文介绍了 Oracle Data Integrator，它是一个基于 Java 的中间件，可以使用数据库在 SOA 中执行基于集合的数据集成任务。现在，复杂的“可热插拔”系统和面向服务的体系结构 (SOA) 得到了广泛应用，这使得将数据合理地整合在一起的难度日益增加。尽管您的主要应用程序数据库在 Oracle 数据库上运行，但是可能还有其他较小的系统在其他供应商提供的数据库和平台上运行。您的应用程序本身可以通过 Web 服务之类的技术进行交互，应用程序和数据可以远程托管，也可以由您在企业数据中心内进行管理。 Oracle Data Integrator 属于 Oracle Fusion Middleware产品系列，它解决了异构程度日益增加的环境中的数据集成需求。它是一个基于 Java 的中间件，可以使用数据库来执行基于集合的数据集成任务，也可以将该功能扩展到多种数据库平台以及 Oracle 数据库。此外，通过它，您还可以通过 Web 服务和消息提取并提供转换数据，以及创建在面向服务的体系结构中响应和创建事件的集成过程。 Oracle Data Integrator 产品体系结构 Oracle Data Integrator 是以 Java 图形模块和调度代理访问的模块化信息库为中心进行组织的。图形模块用于设计和构建集成过程，代理用于安排和协调集成任务。当 Oracle

informatica powercenter vs custom perl ETL job?

阅读更多关于 informatica powercenter vs custom perl ETL job?

Most of my company uses powercenter informatica for Extract-Transform-Load type data move jobs between databases. However project I am on has a big custom Perl job with some Java thrown in for good measure to move data and trigger some other updates. There is talk of rewriting the thing to use powercenter instead, what are people's experiences on such a project, does it make sense? Seems like you trade lot of flexibility in going to such a "off the shelf" solution, but do the ETL tools buy you much in terms of productivity for example? Informatica is good for an operations team. It allows a

Copy data of each table from server A to server B dynamically using SSIS

阅读更多关于 Copy data of each table from server A to server B dynamically using SSIS

My task is to create workflow in SSIS where it will be copying data of each table from server A to the same tables in server B. For now, I have stopped in step where I'm taking data from A server and copy it to server B. Till now I have created workflow where steps are as below: Read data from Excel file where there are placed names of tables to be processed Insert this rows in destination database (server B) for future In Control Flow connected above steps to next object - Execute SQL task where inside I'm taking all loaded names from table to global project variable named as

订阅 etl