etl

AWS: Automating queries in redshift

风流意气都作罢 提交于 2019-12-08 07:53:16
问题 I want to automate a redshift insert query to be run every day. We actually use Aws environment. I was told using lambda is not the right approach. Which is the best ETL process to automate a query in Redshift. 回答1: For automating SQL on Redshift you have 3 options (at least) Simple - cron Use a EC2 instance and set up a cron job on that to run your SQL code. psql -U youruser -p 5439 -h hostname_of_redshift -f your_sql_file Feature rich - Airflow (Recommended) If you have a complex schedule

Build table from JSON in Python

无人久伴 提交于 2019-12-08 04:52:24
问题 I am trying to transform a JSON text into a standard data table using Python, however I have little experience with this and as I search for solutions online I find I am having difficulty implementing any. I was trying to use ast.literal_eval but kept getting an error that I have been unable to solve. raise ValueError('malformed node or string: ' + repr(node)) JSON: { "duration": 202.0, "session_info": { "activation_uuid": "ab90d941-df9d-42c5-af81-069eb4f71515", "launch_uuid": "11101c41-2d79

SSIS Excel File issue - Failure creating file

偶尔善良 提交于 2019-12-08 03:51:14
问题 I have SSIS package that grabs excel file and load it to sql table .i get the following error when i run it. I have tried to make run on 64 bit to false. That did not work i also have installed 64 bit access driver engin . That did not help either. Error at Data Flow Task [Excel Source [2]]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "Excel Connection Manager" failed with error code 0xC0202009. There may be

OrientDB ETL Edge transformer 2 joinFieldName(s)

南笙酒味 提交于 2019-12-08 00:08:42
问题 with one joinFieldName and lookup the Edge transformer works perfect. However, now two keys is required, i.e. compound index in the lookup. How can two joinFieldNames be specified? This is the scripted(post processing) version: Create edge Expands from (select from MC where sample=1 and mkey=6) to (select from Event where sample=1 and mcl=6) . This works, but is not suitable for production. Can anyone help? 回答1: you can simply add 2 joinFieldName(s) like { "edge": { "class": "Conn",

SSIS ForEach loop - change connection inside a for loop

白昼怎懂夜的黑 提交于 2019-12-07 20:01:50
问题 Task - There are 7 SQL servers, each of which have the same database. Consider a table Table_1 of the database. I want to take data from Table_1 of all the 7 servers and put it into Table_1 of Main server (called DataWarehouse in photo below). I created a data flow task to move data from one of these servers to main server. Now, I want to put this data flow task inside a for loop and do the data flow from each of the 7 servers to the main server. How do I do it ? Please see the attached

ETL - Extract, Transform, Load

。_饼干妹妹 提交于 2019-12-07 19:01:28
ETL is short for e xtract, t ransform, l oad , three database functions that are combined into one tool to pull data out of one database and place it into another database. Extract is the process of reading data from a database. In this stage, the data is collected, often from multiple and different types of sources. Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data. Load is the process of

Is there any ETL tool for any Smalltalk dialect?

我是研究僧i 提交于 2019-12-07 09:32:12
问题 ...like Talend for Java, for instance, but that allows to implement processes programatically. Multiple data sources, orchestration, calculated fields, pivot tables are some of the features I would like to have. 回答1: We've build on top of Moose for a ERP data conversion project. Works well with smaller amounts of data (that fit in a 32-bit image). In ETL with multiple sources, just use an image for each input stream/step, connect them together through files or sockets. The visualization was

SSIS: Truncate Excel Destination

余生长醉 提交于 2019-12-07 09:23:27
问题 I am creating a SSIS package that imporr data from a SQL Server Source to an Excel Destination . How can one truncate spreadsheet before run? I tried the following way (using Execute SQL Task with no success. 回答1: Jet provider does not support neither truncate or delete command . You have 3 workarounds: Have an empty excel template that you clone before the running the dataflow, or Use execute sql task to create a new workbook/tab before running the dataflow Drop the worksheet using Drop

Why is an implicit table lock being released prior to end of transaction in RedShift?

一个人想着一个人 提交于 2019-12-07 07:13:00
问题 I have an ETL process that is building dimension tables incrementally in RedShift. It performs actions in the following order: Begins transaction Creates a table staging_foo like foo Copies data from external source into staging_foo Performs mass insert/update/delete on foo so that it matches staging_foo Drop staging_foo Commit transaction Individually this process works, but in order to achieve continuous streaming refreshes to foo and redundancy in the event of failure, I have several

Is it possible to remove white spaces from the CSV files header name in NiFi?

拥有回忆 提交于 2019-12-07 04:56:19
问题 I have a CSV file in which some column name have white spaces in it and some column names are without the white space between characters. I want to remove the white spaces from all the header names that has white space in it. Please help. Thank you! Attaching screenshot for reference. Example: 'First Name' I want 'FirstName' I am using ReplaceText processor in which under Search value I have passes \s to search just the header row white spaces and replacement value as Empty string. Also my