etl

Keep only the most recent row of data in data factory

孤街浪徒 提交于 2019-12-12 10:14:41
问题 I am using Data factory to create our staging area, the problem is whenever source data changes, we add a new row to staging tables. For instance, assume we have the following data: ID Fields created edited 100 ---------- '2017-07-01' '2017-07-05' this will be stored in our staging tables like this: ID Fields created edited 100 ---------- '2017-07-01' null 100 ---------- '2017-07-01' '2017-07-05' Selecting the most recent row is expensive and We don't want that. How do you think we can avoid

Get Last non empty column and row index from excel using Interop

亡梦爱人 提交于 2019-12-12 08:18:14
问题 I am trying to remove all extra blank rows and columns from an excel file using Interop Library. I followed this question Fastest method to remove Empty rows and Columns From Excel Files using Interop and i find it helpful. But i have excel files that contains a small set of data but a lot of empty rows and columns (from the last non empty row (or column) to the end of the worksheet) I tried looping over Rows and Columns but the loop is taking hours. I am trying to get the last non-empty row

SSIS passing flat file paths as command line parameters

耗尽温柔 提交于 2019-12-12 06:46:02
问题 I have an ssis package that takes two flat files and a database tables as connections. I want to run the ssis package from command line by passing these 3 connections as command line parameters. How should I call? After some google search I found that if we are using a DB as connection, this is how to pass. But couldn't figure out how to pass connection parameters for flat files. DTExec.exe /F "<packagepath> /set \package.connections[MyDB].properties[ServerName];SS2K8SV01_Prod 回答1: Assume I

ETL of Human Resource data from Taleo

主宰稳场 提交于 2019-12-12 06:10:55
问题 My company needs to migrate data from a Taleo system to a new HR system. A little research suggests that traditional ETL may not work against the Taleo cloud based system, but I don't know enough about the setup and am trying to learn. Does anyone have experience migrating HR data from Taleo to another system, and, if so, how did you do it, and was traditional ETL an option? Thanks 回答1: How you access Taleo depends as much on your platform as theirs. Example: I'm using Windows: not sure if

Make a DB INSERT based on Text File Input metadata

此生再无相见时 提交于 2019-12-12 04:54:28
问题 I'm developing an ETL and must do some routines for monitoring it. At the begining, I must make in INSERT on DB to create a record informing the filename and starting process datetime. This query will return the record's PK and it must be stored. When the ETL of that file finishes, I must update that record informing the ETL finished with success and its ending process datetime. I use Text File Input to look for files that match its regex, and add its "Additional output fields" to stream. But

Library to move data between repositories

社会主义新天地 提交于 2019-12-12 03:29:55
问题 Is there any open source library (any programming language) that helps to load data from any data source (file, SQL db, NoSQL db, etc.) and store it into any other data repository? I've checked some ETL libraries like Talend or Octopus but they only deal with SQL databases. 回答1: Try https://flywaydb.org/, since NoSQL has different nature than Relational Structure you should write your own converter { "item_id" : 1, "tags" : ["a","b","c"] } How this should be translated into RDBMS? you can use

Best Way to ETL Multiple Different Excel Sheets Into SQL Server 2008 Using SSIS

喜欢而已 提交于 2019-12-12 03:16:10
问题 I've seen plenty of examples of how to enumerate through a collection of Excel workbooks or sheets using the Foreach Loop Container, with the assumption that the data structure of all of the source files are identical and the data is going to a single destination table. What would be the best way to handle the following scenario: - A single Excel workbook with 10 - 20 sheets OR 10 - 20 Excel workbooks with 1 Sheet. - Each workbook/ sheet has a different schema - There is a 1:1 matching

PowerCenter REG_EXTRACT issue

≯℡__Kan透↙ 提交于 2019-12-12 03:14:36
问题 I'm having an issue, converting REGEXP_SUBSTR from ORACLE to REG_EXTRACT in PWC (9.5.1). In Oracle i have the statement below: select regexp_substr('AA 12345678 * 123','[^' || CHR (9) || ']+', 1,1) FIELD1, regexp_substr('AA 12345678 * 123','[^' || CHR (9) || ']+', 1,2) FIELD2, regexp_substr('AA 12345678 * 123','[^' || CHR (9) || ']+', 1,3) FIELD3, regexp_substr('AA 12345678 * 123','[^' || CHR (9) || ']+', 1,4) FIELD4 from DUAL; Result: FIELD1=AA FIELD2=12345678 FIELD3=* FIELD4=123 In PWC i've

informatica multi correlated subquery implementation

独自空忆成欢 提交于 2019-12-12 01:20:20
问题 I am facing a task that due to my lack of experience with Informatica Components, in particular SQL Transformation, I did not implemented yet. So what would be the best approch in PowerCenter to implement this kind of subquery: SELECT A.ID, NVL2(A.SACHKONTO, B.KLAMMER, A.ID) AS KLAMMER FROM Table1 A, (SELECT A.ID AS KLAMMER, B.ID FROM (SELECT ID, ID AS VON_ID, LEAD(ID,1) OVER (ORDER BY ID) - 1 AS BIS_ID FROM Table1 WHERE SACHKONTO IS NULL) A, Table1 B WHERE B.ID BETWEEN A.VON_ID AND A.BIS_ID

Oracle datatype error

孤街浪徒 提交于 2019-12-11 19:16:10
问题 I'm trying to insert a value into date datatype by selecting a value from a source table whose column is also date datatype. I have selected the column directly without doing any conversion using to_date function, because both are same types but I'm getting the following error: SQL Error: ORA-00932: inconsistent datatypes: expected DATE got NUMBER 00932. 00000 - "inconsistent datatypes: expected %s got %s" I had doubled checked, source column has no null values. insert into Target(Targetdate)