etl

SSIS - fill unmapped columns in table in OLE DB Destination

久未见 提交于 2019-12-07 04:23:21
问题 As you can see in the image below, I have a table in SQL Server that I am filling via a flat file source. There are two columns in the destination table that I want to update based on the logic listed below: SessionID - all rows from the first CSV import will have a value of 1; the second import will have a value of 2, and so on. TimeCreated - datetime value of when the CSV imports happened. I don't need help with how to write the TSQL code to get this done. Instead, I would like someone to

SQL Server stored procedure conversion to SSIS Package

浪尽此生 提交于 2019-12-07 02:56:44
问题 Problem: currently we have numerous stored procedures (very long up to 10,000 lines) which were written by various developers for various requirements in last 10 years. It has become hard now to manage those complex/long stored procedures (with no proper documentation). We plan to move those stored procedure into SSIS ETL package. Has anybody done this is past? If yes, what approach should one take. Appreciate if anybody could provide advise on approach to convert stored procedure into SSIS

ETL Pentaho代码学习笔记

一世执手 提交于 2019-12-07 02:36:37
1、 通过设置KETTLE_HOME环境变量可以让.kettle不需要在user.home下 2、默认kettle_home 为User.home,如果要自定义需要设置环境变量KETTLE_HOME 3、在.kettle目录下可放置以下文件或目录: 文件名 说明 kettle.properties 内部运行时的环境变量 .languageChoice 设置运行的语言,方便进行界面语言的翻 译 - - 内容 : LocaleDefault=en_US LocaleFailover=en_U 其它 也可以放置Plugins目录,增加自己的扩展插件 4、插件的类型: 类型 Plugin下的目录 加载XML配置文件名 Step steps kettle-steps.xml Partitioner steps … JobEntry jobentries … Repository repositories … Database databases … Lifecycle repositories … Rules rules … 5、插件的加载 a. 扫描目录: <kettle_home>/plugins 、<运行目录>/plugins、<kettle_home>/plugins/<第4点中类型对对应目录 >、、<运行目录>/plugins/<第4点中类型对对应目录> 说明:前两个扫描申明

Build table from JSON in Python

徘徊边缘 提交于 2019-12-07 01:02:27
I am trying to transform a JSON text into a standard data table using Python, however I have little experience with this and as I search for solutions online I find I am having difficulty implementing any. I was trying to use ast.literal_eval but kept getting an error that I have been unable to solve. raise ValueError('malformed node or string: ' + repr(node)) JSON: { "duration": 202.0, "session_info": { "activation_uuid": "ab90d941-df9d-42c5-af81-069eb4f71515", "launch_uuid": "11101c41-2d79-42cc-bf6d-37be46802fc8" }, "timestamp": "2019-01-18T11:11:26.135Z", "source_page_view_reference": {

PowerShell script to extract .xls file from specific Outlook folder

☆樱花仙子☆ 提交于 2019-12-06 10:32:17
I want to extract and save an .xls file from an email I receive daily. I have a rule set up which saves the email in an Outlook mailbox, within a specific subfolder of the Inbox. The Outlook folder structure looks like this: -> Inbox --> Data (subfolder of "Inbox") ---> ToExtract (subfolder of "Data") I need to extract the .xls file from the "ToExtract" folder. I found a script that does most of the work for me, but it requires the user to supervise the script and manually select which Outlook folder to search. I need to change the script so it just points to the "ToExtract" subfolder. The

Convert SQLite3 Database to JSON iOS

☆樱花仙子☆ 提交于 2019-12-06 10:31:31
问题 I have scoured Google for a tutorial to help with this but haven't been able to find anything comprehensive. I want to one-way sync an SQLite3 database with a web service by sending the data contained in the database in JSON format but am having trouble finding information about how to convert the database into JSON. If anyone can either point me in the direction of a tutorial covering this or alternatively show a brief example of how to go about converting a simple SQLite table, that would

how to determine if a record in every source, represents the same person

流过昼夜 提交于 2019-12-06 10:29:28
问题 I have several sources of tables with personal data, like this: SOURCE 1 ID, FIRST_NAME, LAST_NAME, FIELD1, ... 1, jhon, gates ... SOURCE 2 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 1, jon, gate ... SOURCE 3 ID, FIRST_NAME, LAST_NAME, ANOTHER_FIELD1, ... 2, jhon, ballmer ... So, assuming that records with ID 1, from sources 1 and 2, are the same person, my problem is how to determine if a record in every source, represents the same person . Additionally, sure not every records exists in

Row level atomic MERGE REPLACE in BigQuery

左心房为你撑大大i 提交于 2019-12-06 09:30:21
问题 For my use case I'm working with data identifiable by unique key at the source exploded into n (non deterministic) number of target entries loaded into BigQuery tables for analytic purposes. Building this ETL to use Mongo recent Change Stream feature I would like to drop all entries in BigQuery and then load the new entries atomically. Exploring BigQuery DML I see a MERGE operation is supported, but only WHEN MATCHED THEN DELETE or WHEN MATCHED THEN UPDATE is possible. I'm interested in a

Parametrized transformation from Pentaho DI server console

隐身守侯 提交于 2019-12-06 08:32:52
I can execute a independent scheduled transformation from pentaho DI server console . But, issue on running a parametrized scheduled transformation from pentaho DI server console .How can i pass parameter value at run time . In pentaho BI server , to execute parametrized report we used to pass variable value in URL . tried same in pentho DI server as below but didnt worked http:// * * /pentaho-di/kettle/transStatus?name=UI_parameter&Values=Testvalue 来源: https://stackoverflow.com/questions/21878574/parametrized-transformation-from-pentaho-di-server-console

ECS Airflow 1.10.2 performance issues. Operators and tasks take 10x longer

北慕城南 提交于 2019-12-06 08:25:34
问题 We moved to puckel/Airflow-1.10.2 to try and resolve a poor performance we've had in multiple environments. We are running on ECS Airflow 1.10.2 on AWS ECS. Interestingly, the CPU/mem never jump above 80%. The Airflow metadb stays very underutilized as well. Below I've listed the configuration we're using, the DagBag parsing time plus the detailed execution times from the cProfile output of just running DagBag() in pure Python. A few of our DAGs import a function from create_subdag_functions