data-integration

Pentaho Data Integration Import large dataset from DB

余生颓废 提交于 2021-02-10 20:30:26
问题 I'm trying to import a large set of data from one DB to another (MSSQL to MySQL). The transformation does this: gets a subset of data, check if it's an update or an insert by checking hash, map the data and insert it into MySQL DB with an API call. The subset part for the moment is strictly manual, is there a way to set Pentaho to do it for me, kind of iteration. The query I'm using to get the subset is select t1.* from ( select *, ROW_NUMBER() as RowNum over (order by id) from mytable ) t1

Parent-Child relationship in Talend

大兔子大兔子 提交于 2020-06-29 07:19:14
问题 Facing problem and out of ideas on figuring on how to implement parent-child relationship in Talend. Problem Statement: Having a feed file which has data in below format MemberCode|LastName|FirstName A|SHINE|MICHAEL B|SHINE|MICHELLE C|SHINE|ERIN A|RODRIGUEZ|DAMIAN A|PAVELSKY|STEPHEN B|PAVELSKY|TERESA (there are many more columns and many more rows - just few rows for reference purpose). LastName and FirstName are self-explanatory. MemberCode denotes the relationship. A will be parent, B or C

How to make requests in third party APIs and load the results periodically on google BigQuery? What google services should I use?

断了今生、忘了曾经 提交于 2020-02-23 04:58:59
问题 I need to get the data from a third party API and ingest it in google BigQuery. Perhaps, I need to automate this process through google services to do it periodically. I am trying to use Cloud Functions, but it needs a trigger. I have also read about App Engine, but I believe it is not suitable for only one function to make pull requests. Another doubt is: do I need to load the data into cloud storage or can I load it straight to BigQuery? Should I use Dataflow and make any configuration? def

Expose Talend ETL Job as a Web Service

匆匆过客 提交于 2019-12-30 09:35:15
问题 I am currently evaluating Talend ETL (Talend Open Studio for Data Integration). I would like to know how / if i can expose an ETL Job as a Web Service. I know i can export jobs as web services and invoke them through a specific URL however, my goal is to be able to expose a specific WSDL with IN / OUT parameters. A sample use case would be: 1) Invoke WS in Talend ETL and pass XML with data 2) Talend ETL extracts the data from the XML and inserts them as variable(s) in the query to be executed

Blob fields in SAS gets truncated

坚强是说给别人听的谎言 提交于 2019-12-25 07:35:03
问题 I have been working on a SAS job that extracts a table from SQL server and then loads that table to an Oracle table. One of the fields there in SQL server is blob and they can be as big as 1G. I am getting length warnings when I run this blobs on oracle table seems to be truncated and as a result files there are corrupt. I have seen SAS stating that character variable can be max 32K but SAS also states it can access blobs up to 2G. How can we achieve that? proc sql; create view work.W2K3NU8

Pentaho Kettle - Get the file names dynamically

感情迁移 提交于 2019-12-25 01:47:23
问题 I hope this message finds everyone well! I'm stucked on a situation on Pentaho PDI Tool and I'm looking for an answer (or at least a light in the end of the cave) to solve it! I have to import, every month, a bunch of xls's files of differents clients. Every file has a different name (witch is given aleatory) and this files are on a folder named with the name of the client. However, I use the same process for all clients and situations. Is there a way to pass the name of the directory as a

Connecting a set of points to get a non-self-intersecting non-convex polygon

岁酱吖の 提交于 2019-12-25 01:27:56
问题 I have an unordered set of 2D points which represents the corners of a building. I need to connect them to get the outline of the building. The points were obtained by combining different polygons collected by different individuals. My idea is to use these polygons to get the points in order (e.g. taking the region between the biggest and smallest polygons and connect the points such that it comes in this region). I tried using the minimum distance criteria and also to connect the points

How to integrate tabular data into GraphDB automatically?

↘锁芯ラ 提交于 2019-12-23 03:36:26
问题 I want to import tabular (xls) data automatically into GraphDB. OntoRefine suits my case very will with the power of OpenRefine and SPARQL. Now, i am thinking about the following approach New tabular data is available as XLS file OntoRefine updates a project or creates a new project automatically SPARQL queries against RDFbridge to create new triples SPARQL Insert to add this triple Is there an alternative approach to automate it? If this is the best, how can i update or create a new