etl | 易学教程

I need to run 2 twaitforfile in the same subjob talend

阅读更多关于 I need to run 2 twaitforfile in the same subjob talend

问题 I need to run 2 file watchers in the same subjob using talend. Right now If I link them together I get only one file watching running when I run talend. Below is what I have. Is there a way to execute them together. The reason for this is because im trying to get the Fk from one table into the tmap from another table, so any other suggestion on how to do this is also appreciated: enter image description here 回答1: In one subjob, you can only have one 'beginning' component (the one with the

Is it possible to sequentially chain Google DataPrep flows?

阅读更多关于 Is it possible to sequentially chain Google DataPrep flows?

问题 I have quite a long set of transforms, which I'd like to break into modules (each in it's own flow). I can't see a way of chaining these, other than scheduling consecutive timeslots. Has anyone managed this, or do I need to build one massive flow? 回答1: In your flow page, click on the three dots ... near your recipe. In the menu, you have Create Reference Dataset . Once created, you will see a new logo under your recipe, you can then click on the menu and choose Add to flow 来源： https:/

Talend Open Studio : scripting languages versus Microsoft SSIS

阅读更多关于 Talend Open Studio : scripting languages versus Microsoft SSIS

问题 I have been trying to find out if Talend Open Studio has a scripting language. I hope that maybe it would be Perl or Python. I have been using Microsoft SSIS ETL tool, and they have a Script-Component to handle more complex ETL tasks. The SSIS Script-Component uses the languages C# and VB.NET as its scripting language. Does Talend Open Studio have an equivalent to MS-SSIS Scripting Component. I could not find much on the web on this. The amount of material available for Talend Open Studio is

How to add new line after closing tg in Datastage xml output?

阅读更多关于 How to add new line after closing tg in Datastage xml output?

问题 So i've been making a XML output with the output layout/format/styling like this <HAI> <TIME_SK> <INSERT_DATE> 20191021 </INSERT_DATE> <SRC_STM_ID> 1 </SRC_STM_ID> <HAI> <TIME_SK> <INSERT_DATE> 20191021 </INSERT_DATE> <SRC_STM_ID> 1 </SRC_STM_ID> but i wanted to make the output like this <HAI> <TIME_SK> <INSERT_DATE> 20191021</INSERT_DATE> <SRC_STM_ID>1</SRC_STM_ID> Someone have any idea? Thankyou 来源： https://stackoverflow.com/questions/58705214/how-to-add-new-line-after-closing-tg-in

How to schedule an SSAS cube refresh only for new facts or updated dimensions?

阅读更多关于 How to schedule an SSAS cube refresh only for new facts or updated dimensions?

问题 Having built a few "test" datacubes through using VS2017, my team are now ready to start working with them in a more production like manner. As such there are a few basic tasks that we need to implement, but we are struggling to find useful resources for. How can we do a monthly refresh of the cube without regenerating all of our dimensions and fact tables? Does VS2017 recognise/honour Slowly Changing Dimensions if we implement them in our Dimension design? To have a guess at this: In our ETL

How to pivot row data using Informatica?

阅读更多关于 How to pivot row data using Informatica?

icCube - ETL - how to execute a file dump

阅读更多关于 icCube - ETL - how to execute a file dump

问题 In the icCube ETL, there is a data manipulation (data view) called "File dump". I have set-up a couple of them in the ETL process, but none are executed when the data is loaded into icCube. This is a simple version of what I do: data source 1 > data view: a > used in FACTS data source 1 > data view: a > data view: file dump The file dump is not executed, as I do not see a file on the server. How to achieve that during load, there is alway a file dump available? 回答1: You should ensure the view

Treating a tab-delimted column as a bulk insert in SSIS

阅读更多关于 Treating a tab-delimted column as a bulk insert in SSIS

问题 I am importing a flat file with the following format: H(tab)OrderNumber(tab)CustomerNumber(tab)ERPMessage D(tab)OrderNumber(tab)ItemNumber(tab)ItemDescription(tab)ItemPrice(tab)Qty D(tab)OrderNumber(tab)ItemNumber(tab)ItemDescription(tab)ItemPrice(tab)Qty . . . I am BULK LOADing the file using a format file to a staging table that looks like this: RecordType varchar(1) RecordDetail varchar(MAX) so when it hits my staging table, it looks like this: RecordType | RecordDetail -------------------

Importing Large Size of Zipped JSON File from Amazon S3 into AWS RDS-PostgreSQL Using Python

阅读更多关于 Importing Large Size of Zipped JSON File from Amazon S3 into AWS RDS-PostgreSQL Using Python

问题 I'm trying to import a large size of ZIPPED JSON FILE from Amazon S3 into AWS RDS-PostgreSQL using Python. But, these errors occured, Traceback (most recent call last): File "my_code.py", line 64, in file_content = f.read().decode('utf-8').splitlines(True) File "/usr/lib64/python3.6/zipfile.py", line 835, in read buf += self._read1(self.MAX_N) File "/usr/lib64/python3.6/zipfile.py", line 925, in _read1 data = self._decompressor.decompress(data, n) MemoryError //my_code.py import sys import

In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

阅读更多关于 In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

问题 I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This was good enough except when there were multiple transactions at the same milli second. But now we have Change Data Capture (CDC) with SQL Server 2008 and it