etl

I need to run 2 twaitforfile in the same subjob talend

笑着哭i 提交于 2019-12-11 17:30:11
问题 I need to run 2 file watchers in the same subjob using talend. Right now If I link them together I get only one file watching running when I run talend. Below is what I have. Is there a way to execute them together. The reason for this is because im trying to get the Fk from one table into the tmap from another table, so any other suggestion on how to do this is also appreciated: enter image description here 回答1: In one subjob, you can only have one 'beginning' component (the one with the

Is it possible to sequentially chain Google DataPrep flows?

与世无争的帅哥 提交于 2019-12-11 15:33:59
问题 I have quite a long set of transforms, which I'd like to break into modules (each in it's own flow). I can't see a way of chaining these, other than scheduling consecutive timeslots. Has anyone managed this, or do I need to build one massive flow? 回答1: In your flow page, click on the three dots ... near your recipe. In the menu, you have Create Reference Dataset . Once created, you will see a new logo under your recipe, you can then click on the menu and choose Add to flow 来源: https:/

Talend Open Studio : scripting languages versus Microsoft SSIS

不打扰是莪最后的温柔 提交于 2019-12-11 14:48:36
问题 I have been trying to find out if Talend Open Studio has a scripting language. I hope that maybe it would be Perl or Python. I have been using Microsoft SSIS ETL tool, and they have a Script-Component to handle more complex ETL tasks. The SSIS Script-Component uses the languages C# and VB.NET as its scripting language. Does Talend Open Studio have an equivalent to MS-SSIS Scripting Component. I could not find much on the web on this. The amount of material available for Talend Open Studio is

How to add new line after closing tg in Datastage xml output?

流过昼夜 提交于 2019-12-11 14:45:43
问题 So i've been making a XML output with the output layout/format/styling like this <HAI> <TIME_SK> <INSERT_DATE> 20191021 </INSERT_DATE> <SRC_STM_ID> 1 </SRC_STM_ID> <HAI> <TIME_SK> <INSERT_DATE> 20191021 </INSERT_DATE> <SRC_STM_ID> 1 </SRC_STM_ID> but i wanted to make the output like this <HAI> <TIME_SK> <INSERT_DATE> 20191021</INSERT_DATE> <SRC_STM_ID>1</SRC_STM_ID> Someone have any idea? Thankyou 来源: https://stackoverflow.com/questions/58705214/how-to-add-new-line-after-closing-tg-in

How to schedule an SSAS cube refresh only for new facts or updated dimensions?

天大地大妈咪最大 提交于 2019-12-11 14:40:20
问题 Having built a few "test" datacubes through using VS2017, my team are now ready to start working with them in a more production like manner. As such there are a few basic tasks that we need to implement, but we are struggling to find useful resources for. How can we do a monthly refresh of the cube without regenerating all of our dimensions and fact tables? Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design? To have a guess at this: In our ETL

How to pivot row data using Informatica?

痞子三分冷 提交于 2019-12-11 11:03:50
问题 How can I pivot row data using Informatica PowerCenter Designer? Say, I have a source file called address.txt: +---------+--------------+-----------------+ | ADDR_ID | NAME | ADDRESS | +---------+--------------+-----------------+ | 1 | John Smith | JohnsAddress1 | | 1 | John Smith | JohnsAddress2 | | 2 | Adrian Smith | AdriansAddress1 | | 2 | Adrian Smith | AdriansAddress2 | +---------+--------------+-----------------+ I would like to Pivot this data like this: +---------+--------------+-----

icCube - ETL - how to execute a file dump

烈酒焚心 提交于 2019-12-11 10:32:52
问题 In the icCube ETL, there is a data manipulation (data view) called "File dump". I have set-up a couple of them in the ETL process, but none are executed when the data is loaded into icCube. This is a simple version of what I do: data source 1 > data view: a > used in FACTS data source 1 > data view: a > data view: file dump The file dump is not executed, as I do not see a file on the server. How to achieve that during load, there is alway a file dump available? 回答1: You should ensure the view

Treating a tab-delimted column as a bulk insert in SSIS

删除回忆录丶 提交于 2019-12-11 08:08:23
问题 I am importing a flat file with the following format: H(tab)OrderNumber(tab)CustomerNumber(tab)ERPMessage D(tab)OrderNumber(tab)ItemNumber(tab)ItemDescription(tab)ItemPrice(tab)Qty D(tab)OrderNumber(tab)ItemNumber(tab)ItemDescription(tab)ItemPrice(tab)Qty . . . I am BULK LOADing the file using a format file to a staging table that looks like this: RecordType varchar(1) RecordDetail varchar(MAX) so when it hits my staging table, it looks like this: RecordType | RecordDetail -------------------

Importing Large Size of Zipped JSON File from Amazon S3 into AWS RDS-PostgreSQL Using Python

三世轮回 提交于 2019-12-11 07:57:27
问题 I'm trying to import a large size of ZIPPED JSON FILE from Amazon S3 into AWS RDS-PostgreSQL using Python. But, these errors occured, Traceback (most recent call last): File "my_code.py", line 64, in file_content = f.read().decode('utf-8').splitlines(True) File "/usr/lib64/python3.6/zipfile.py", line 835, in read buf += self._read1(self.MAX_N) File "/usr/lib64/python3.6/zipfile.py", line 925, in _read1 data = self._decompressor.decompress(data, n) MemoryError //my_code.py import sys import

In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

时光毁灭记忆、已成空白 提交于 2019-12-11 07:49:37
问题 I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This was good enough except when there were multiple transactions at the same milli second. But now we have Change Data Capture (CDC) with SQL Server 2008 and it