etl

Do I need a ETL?

自古美人都是妖i 提交于 2019-12-19 09:03:01
问题 We currently use Datastage ETL to - Export a CSV/text file with data from 15 tables(3 different schemas) on a daily basis. I am wondering If there is a simpler way to accomplish this with out using an ETL. I tried Scriptella. It looks simple/fast, but it again it is an ETL. Please suggest.. 回答1: We use Python. Every programming language -- every single one ever invented -- is an alternative to an ETL. You never need an ETL. The questions is these: Which is cheaper to build? Custom software or

Adding logs to Airflow Logs

风流意气都作罢 提交于 2019-12-18 19:06:57
问题 How can I add my own logs onto the Apache Airflow logs that are automatically generated? any print statements wont get logged in there, so I was wondering how I can add my logs so that it shows up on the UI as well? 回答1: I think you can work around this by using the logging module and trusting the configuration to Airflow. Something like: import ... dag = ... def print_params_fn(**kwargs): import logging logging.info(kwargs) return None print_params = PythonOperator(task_id="print_params",

SSIS process for saving .xlsx file as .csv file

坚强是说给别人听的谎言 提交于 2019-12-18 09:31:50
问题 I am trying to download a .xlsx excel file from FTP server and save it in a .csv file format. I was able to download a file from server using ftp task in SSIS and save it in a local folder now I want to save that file as csv file format for import process. I could not find a conversion method or task from .xlxs to csv. i tried script task but it didn't work. can someone please help. 回答1: You can add a Script task to achieve this, and inside the script you can use Interop Library: Converting

SSIS failing to save packages and reboots Visual Studio

China☆狼群 提交于 2019-12-18 09:09:11
问题 This is my first experience with SSIS so bear with me... I am using SSIS to migrate tables from Oracle to SSMS, there are some very large tables I am trying to transfer (50 million rows +). SSIS is now completely freezing up and rebooting VS when I am just trying to save the package (not even running it). It keeps returning errors of insufficient memory, however, I am working on a remote server that has well over the RAM it takes to run this package. Error Message when trying to save The only

How to join two CSVs with Apache Nifi

爷,独闯天下 提交于 2019-12-18 05:06:25
问题 I'm looking into ETL tools (like Talend) and investigating whether Apache Nifi could be used. Could Nifi be used to perform the following: Pick up two CSV files that are placed on local disk Join the CSVs on a common column Write the joined CSV to disk I've tried setting up a job in Nifi, but couldn't see how to perform the join of two separate CSV files. Is this task possible in Apache Nifi? It looks like the QueryDNS processor could be used to perform enrichment of one CSV file using the

System.ArgumentException: Object is not an ADODB.RecordSet or an ADODB.Record

烂漫一生 提交于 2019-12-18 04:23:09
问题 I used the code below to fill a data table - OleDbDataAdapter oleDA = new OleDbDataAdapter(); DataTable dt = new DataTable(); oleDA.Fill(dt, Dts.Variables["My_Result_Set"].Value); I get the error - Error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentException: Object is not an ADODB.RecordSet or an ADODB.Record. Parameter name: adodb at System.Data.OleDb.OleDbDataAdapter.FillFromADODB(Object data, Object adodb,

Spark incremental loading overwrite old record

穿精又带淫゛_ 提交于 2019-12-18 03:48:47
问题 I have a requirement to do the incremental loading to a table by using Spark (PySpark) Here's the example: Day 1 id | value ----------- 1 | abc 2 | def Day 2 id | value ----------- 2 | cde 3 | xyz Expected result id | value ----------- 1 | abc 2 | cde 3 | xyz This can be done easily in relational database, Wondering whether this can be done in Spark or other transformational tool, e.g. Presto? 回答1: Here you go! First Dataframe: >>> list1 = [(1, 'abc'),(2,'def')] >>> olddf = spark

Oracle provider for Oledb missing in VS 2015 Shell

独自空忆成欢 提交于 2019-12-17 21:11:27
问题 I am migrating to SSIS 2016 version. I am trying to use the Oracle provider for Oledb in connections. However, this option does not show. I have installed the Oracle client 12.2 and I am able to do UDL file testing, where in I can see the Oracle provider and able to test connection. But when I try in VS 2015, the option is not shown. The issue is described here - https://jorgklein.com/2011/06/02/ssis-connect-to-oracle-on-a-64-bit-machine-updated-for-ssis-2008-r2/ and based on this I have

SSIS How to get part of a string by separator

。_饼干妹妹 提交于 2019-12-17 21:02:23
问题 I need an SSIS expression to get the left part of a string before the separator, and then put the new string in a new column. I checked in derived column, it seems no such expressions. Substring could only return string part with fixed length. For example, with separator string - : Art-Reading Should return Art Art-Writing Should return Art Science-chemistry Should return Science P.S. I knew this could be done in MySQL with SUBSTRING_INDEX() , but I'm looking for an equivalent in SSIS, or at

CAST vs ssis data flow implicit conversion difference

浪尽此生 提交于 2019-12-17 19:47:05
问题 I have a SSIS package which transfers some data from Oracle to SQL Server. In Oracle dates are stored as float, e.g. 42824 == '2017-04-01' - application which uses the database is written in Delphi. While select CAST(42824 as datetime) in Management Studio results in '2017-04-01 00:00:00.000' , the same value (42824) inserted by package into datetime column in SQL Server table shows 2017-03-30 00:00:00.000 . Note: Source data type for this number is DT_R8 , changing the type to DT_UI4 in Data