etl

Importing yyyyMMdd Dates From CSV in SSIS

南笙酒味 提交于 2019-12-10 10:41:44
问题 I have 12 columns using the yyyymmdd format. In the Data Flow Task , I have a Flat File Source , a Derived Column Task and an OLE DB Destination . I'm applying the following expression to these fields in the Derived Column Task : (DT_DBDATE)(SUBSTRING((DT_STR,10,1252)([Date_Column]),1,4) + "-" + SUBSTRING((DT_STR,10,1252)([Date_Column]),5,2) + "-" + SUBSTRING((DT_STR,10,1252)([Date_Column]),7,2)) It keeps making me convert the field before I substring it, but I have the fields set up as DT

Copy data of each table from server A to server B dynamically using SSIS

无人久伴 提交于 2019-12-10 10:28:34
问题 My task is to create workflow in SSIS where it will be copying data of each table from server A to the same tables in server B. For now, I have stopped in step where I'm taking data from A server and copy it to server B. Till now I have created workflow where steps are as below: Read data from Excel file where there are placed names of tables to be processed Insert this rows in destination database (server B) for future In Control Flow connected above steps to next object - Execute SQL task

Talend - generating n multiple rows from 1 row

╄→гoц情女王★ 提交于 2019-12-10 09:33:11
问题 Background: I'm using Talend to do something (I guess) that is pretty common: generating multiple rows from one. For example: ID | Name | DateFrom | DateTo 01 | Marco| 01/01/2014 | 04/01/2014 ...could be split into: new_ID | ID | Name | DateFrom | DateTo 01 | 01 | Marco | 01/01/2014 | 02/01/2014 02 | 01 | Marco | 02/01/2014 | 03/01/2014 03 | 01 | Marco | 03/01/2014 | 04/01/2014 The number of outcoming rows is dynamic, depending on the date period in the original row. Question: how can I do

Data profiling Task - custom Profile Request

旧时模样 提交于 2019-12-10 02:00:37
问题 Is there any option to create a custom Profile Request for SSIS Data Profiling Task? At the moment there are 5 standard profile requests under SSIS Data Profiling task: Column Null Ratio Profile Request Column Statistics Profile Request Column Length Distribution Profile Request Column Value Distribution Profile Request Candidate Key Profile Request I need to add another one (Custom one) to get summary of all numeric values. Thanks in advance for your helps. 回答1: Based on this Microsoft

What is the best way to save XML data to SQL Server?

[亡魂溺海] 提交于 2019-12-09 16:17:54
问题 Is there a direct route that is pretty straight forward? (i.e. can SQL Server read XML) Or, is it best to parse the XML and just transfer it in the usual way via ADO.Net either as individual rows or perhaps a batch update? I realize there may be solutions that involve large complex stored procs--while I'm not entirely opposed to this, I tend to prefer to have most of my business logic in the C# code. I have seen a solution using SQLXMLBulkLoad, but it seemed to require fairly complex SQL code

ETL数据抽取方案

淺唱寂寞╮ 提交于 2019-12-09 11:30:04
ETL 过程中的主要环节就是数据抽取、数据转换和加工、数据装载。为了实现这些功能,ETL 工具会进行一些功能上的扩充,例如工作流、调度引擎、规则引擎、脚本支持、统计信息等。 一、数据抽取 数据抽取是从数据源中抽取数据的过程。实际应用中,数据源较多采用的是关系数据库。 从数据库中抽取数据一般有以下几种方式: 1.全量抽取 全量抽取类似于数据迁移或数据复制,它将数据源中的表或视图的数据原封不动的从数 据库中抽取出来,并转换成自己的ETL 工具可以识别的格式。全量抽取比较简单。 2.增量抽取 增量抽取只抽取自上次抽取以来数据库中要抽取的表中新增或修改的数据。在ETL 使用过程中,增量抽取较全量抽取应用更广。如何捕获变化的数据是增量抽取的关键。 对捕获方法一般有两点要求: 准确性:能够将业务系统中的变化数据按一定的频率准确地捕获到; 性能:不能对业务系统造成太大的压力,影响现有业务。目前增量数据抽取中常用的捕获变化数据的方法有: (1) 触发器方式(又称快照式) 在要抽取的表上建立需要的触发器,一般要建立插入、修改、删除三个触发器,每当源表中的数据发生变化,就被相应的触发器将变化的数据写入一个临时表,抽取线程从临时表中抽取数据,临时表中抽取过的数据被标记或删除。 优缺点 优点:数据抽取的性能高,ETL 加载规则简单,速度快,不需要修改业务系统表结构,可以实现数据的递增加载。 缺点

Fill SQL database from a CSV File

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-09 02:27:51
问题 I need to create a database using a CSV file with SSIS. The CSV file includes four columns: I need to use the information of that table to populate the three tables I created in SQL below. I have realized that what I need is to use one column of the Employee Table, EmployeeNumber , and Group Table, GroupID , to populate the EmployeeGroup table. For that, I thought that a Join Merge table is what I needed, but I created the Data Flow Task in SSIS, and the results are the same, no data

Google Cloud Dataflow consume external source

柔情痞子 提交于 2019-12-08 09:33:35
问题 So I am having a bit of a issue with the concepts behind Dataflow. Especially regarding the way the pipelines are supposed to be structured. I am trying to consume an external API that delivers an index XML file with links to separate XML files. Once I have the contents of all the XML files I need to split those up into separate PCollections so additional PTransforms can be done. It is hard to wrap my head around the fact that the first xml file needs to be downloaded and read, before the

Mapping FK into a table in talend

十年热恋 提交于 2019-12-08 09:12:18
问题 I have 2 entities in my schema. I mapped one already and now for the second one I need to also have the PK of the first entity as a FK in the second entity when mapping using talend. They are both in the same job, but how can I use the Pk of the first entity in the mapping of the second entity? enter image description here This is what I have so far (row1 is entity1 which has an autogenerated key inside the tmap) row2 is creating csv from xml file row3 maps the csv file generated from row2,

Using mapping parameter in an Informatica stored procedure call

北城余情 提交于 2019-12-08 08:24:17
问题 I am using a stored procedure as a source in my Informatica mapping, and I have defined the SQL query in the source qualifier as exec dbo.GET_ATTRIBUTES($$fromDate, $$toDate) where $$fromDate and $$toDate are mapping parameters I have defined in a parameter file. I have tried a number of different ways of going about this and none seem to work, as the SQL query fails to validate. So, my question boils down to this, is there a way to call a stored procedure while passing in two mapping