etl

Is Alteryx an ETL tool? How it differs from SSIS? [closed]

瘦欲@ 提交于 2019-12-05 17:18:47
Closed. This question is off-topic . It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . My client want me to implement ETL process using Alteryx as they have a license of it. I am confused whether the Alteryx is an ETL tool or not. I believe that Alteryx is commonly used to prepare data for Tableau data visualization tool. Please advise whether its an ETL tool or not? How it differs from SSIS? Thanks, Alteryx is a data preparation / advanced anaytics application. People use it in many different

Talend - generating n multiple rows from 1 row

僤鯓⒐⒋嵵緔 提交于 2019-12-05 16:30:59
Background: I'm using Talend to do something (I guess) that is pretty common: generating multiple rows from one. For example: ID | Name | DateFrom | DateTo 01 | Marco| 01/01/2014 | 04/01/2014 ...could be split into: new_ID | ID | Name | DateFrom | DateTo 01 | 01 | Marco | 01/01/2014 | 02/01/2014 02 | 01 | Marco | 02/01/2014 | 03/01/2014 03 | 01 | Marco | 03/01/2014 | 04/01/2014 The number of outcoming rows is dynamic, depending on the date period in the original row. Question: how can I do this? Maybe using tSplitRow? I am going to check those periods with tJavaRow. Any suggestions? Expanding

How do I fix 'Invalid character value for cast specification' on a date column in flat file?

╄→尐↘猪︶ㄣ 提交于 2019-12-05 15:35:36
问题 I have a CSV file with a {LF} delimiting each row and a date column with the date format as "12/20/2010" (including quotation marks) My destination column is a SQL Server 2008 database table of type date (not datetime) In my Flat File Connection Manager, I have configured the date column to be data type date [DT_DATE] with TextQualified set to true and the column delimiter as {LF} (it is the last column on each row). I have the text qualifier set to " When I try to load this into an OLE

SQL Server stored procedure conversion to SSIS Package

瘦欲@ 提交于 2019-12-05 08:42:52
Problem: currently we have numerous stored procedures (very long up to 10,000 lines) which were written by various developers for various requirements in last 10 years. It has become hard now to manage those complex/long stored procedures (with no proper documentation). We plan to move those stored procedure into SSIS ETL package. Has anybody done this is past? If yes, what approach should one take. Appreciate if anybody could provide advise on approach to convert stored procedure into SSIS ETL Packages. Thanks I've done this before, and what worked well for my team was to refactor

Is it possible to remove white spaces from the CSV files header name in NiFi?

一曲冷凌霜 提交于 2019-12-05 08:26:21
I have a CSV file in which some column name have white spaces in it and some column names are without the white space between characters. I want to remove the white spaces from all the header names that has white space in it. Please help. Thank you! Attaching screenshot for reference. Example: 'First Name' I want 'FirstName' I am using ReplaceText processor in which under Search value I have passes \s to search just the header row white spaces and replacement value as Empty string. Also my evaluation mode is 'Line-by-Line'. so now the ouput file is showing as FirstName,LastNameshraddha

How to use OrientDB ETL to create edges only

半世苍凉 提交于 2019-12-05 07:52:28
I have two CSV files: First containing ~ 500M records in the following format id,name 10000023432,Tom User 13943423235,Blah Person Second containing ~ 1.5B friend relationships in the following format fromId,toId 10000023432,13943423235 I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them. I have tried multiple configuration of the ETL json file so far, the latest being this one: { "config": {"parallel": true}, "source": { "file": { "path": "path_to_file" } }, "extractor": { "csv": {} },

SSIS vs Pentaho

半腔热情 提交于 2019-12-05 06:42:18
Has anyone used both of these to provide a good comparison. I am doing a school project so the cost of SSIS isn't an issue as we already have the license for it. Background on whats going on. I will be downloading about 10 years of patent information into flat files. The result will be 2,080 delimited files. I want a way to load them into MS SQL server all at once. Then I want to be able to append additional files into the DB as they are released. Speed of the software doesn't bother me much as I can just let it run overnight. I am just looking for something with some flexibility, and more

How to move data from Glue to Dynamodb

戏子无情 提交于 2019-12-05 00:28:53
问题 We are designing an Big data solution for one of our dashboard applications and seriously considering Glue for our initial ETL. Currently Glue supports JDBC and S3 as the target but our downstream services and components will work better with dynamodb. We are wondering what is the best approach to eventually move the records from Glue to Dynamo. Should we write to S3 first and then run lambdas to insert the data into Dynamo? Is that the best practice? OR Should we use a third party JDBC

SSIS vs. Oracle Data Integrator

徘徊边缘 提交于 2019-12-04 23:51:53
Currently I am a Data Engineer that works mainly with SSIS. While reading about the ETL tools available in the market, i found that Oracle has its own ETL tool called ODI (Oracle Data integrator). I searched for an unbiased comparison between the Oracle Data Integrator and SSIS. I didn't find any article about that. There are some biased article such as : ETL Tools Comparison of Oracle ODI & Microsoft SSIS Tool -Dec 2014 Competitive Comparison of SQL Server 2008 Integration Services Based on Stackoverflow questions, there are about 16000 questions about SSIS while ODI has about 200 questions.

ErrorColumn value does not exist as Lineage ID

强颜欢笑 提交于 2019-12-04 22:04:31
问题 During the insert into a destination table, any error that occurs is redirected to Errors table where we can see the ErrorCode and ErrorColumn . The problem is that we got a value in ErrorColumn which does not exist anywhere within the package. Namely, there is not a single column that has LineageID that is equal to ErrorColumn . Later, while enabling NULL entry in every single column, one by one, I found which column caused the problem. When I analyzed the column inside of a Data Flow task