data-integration | 易学教程

Pentaho Data Integration Import large dataset from DB

阅读更多关于 Pentaho Data Integration Import large dataset from DB

问题 I'm trying to import a large set of data from one DB to another (MSSQL to MySQL). The transformation does this: gets a subset of data, check if it's an update or an insert by checking hash, map the data and insert it into MySQL DB with an API call. The subset part for the moment is strictly manual, is there a way to set Pentaho to do it for me, kind of iteration. The query I'm using to get the subset is select t1.* from ( select *, ROW_NUMBER() as RowNum over (order by id) from mytable ) t1

Where to download sun.jdbc.odbc.JdbcOdbcDriver (trying to connect output csv from Spoon to SSMS)

阅读更多关于 Where to download sun.jdbc.odbc.JdbcOdbcDriver (trying to connect output csv from Spoon to SSMS)

来源： https://stackoverflow.com/questions/51732371/where-to-download-sun-jdbc-odbc-jdbcodbcdriver-trying-to-connect-output-csv-fro

Where to download sun.jdbc.odbc.JdbcOdbcDriver (trying to connect output csv from Spoon to SSMS)

阅读更多关于 Where to download sun.jdbc.odbc.JdbcOdbcDriver (trying to connect output csv from Spoon to SSMS)

来源： https://stackoverflow.com/questions/51732371/where-to-download-sun-jdbc-odbc-jdbcodbcdriver-trying-to-connect-output-csv-fro

Parent-Child relationship in Talend

阅读更多关于 Parent-Child relationship in Talend

How to make requests in third party APIs and load the results periodically on google BigQuery? What google services should I use?

阅读更多关于 How to make requests in third party APIs and load the results periodically on google BigQuery? What google services should I use?

问题 I need to get the data from a third party API and ingest it in google BigQuery. Perhaps, I need to automate this process through google services to do it periodically. I am trying to use Cloud Functions, but it needs a trigger. I have also read about App Engine, but I believe it is not suitable for only one function to make pull requests. Another doubt is: do I need to load the data into cloud storage or can I load it straight to BigQuery? Should I use Dataflow and make any configuration? def

Expose Talend ETL Job as a Web Service

阅读更多关于 Expose Talend ETL Job as a Web Service

问题 I am currently evaluating Talend ETL (Talend Open Studio for Data Integration). I would like to know how / if i can expose an ETL Job as a Web Service. I know i can export jobs as web services and invoke them through a specific URL however, my goal is to be able to expose a specific WSDL with IN / OUT parameters. A sample use case would be: 1) Invoke WS in Talend ETL and pass XML with data 2) Talend ETL extracts the data from the XML and inserts them as variable(s) in the query to be executed

Blob fields in SAS gets truncated

阅读更多关于 Blob fields in SAS gets truncated

问题 I have been working on a SAS job that extracts a table from SQL server and then loads that table to an Oracle table. One of the fields there in SQL server is blob and they can be as big as 1G. I am getting length warnings when I run this blobs on oracle table seems to be truncated and as a result files there are corrupt. I have seen SAS stating that character variable can be max 32K but SAS also states it can access blobs up to 2G. How can we achieve that? proc sql; create view work.W2K3NU8

Pentaho Kettle - Get the file names dynamically

阅读更多关于 Pentaho Kettle - Get the file names dynamically

问题 I hope this message finds everyone well! I'm stucked on a situation on Pentaho PDI Tool and I'm looking for an answer (or at least a light in the end of the cave) to solve it! I have to import, every month, a bunch of xls's files of differents clients. Every file has a different name (witch is given aleatory) and this files are on a folder named with the name of the client. However, I use the same process for all clients and situations. Is there a way to pass the name of the directory as a

Connecting a set of points to get a non-self-intersecting non-convex polygon

阅读更多关于 Connecting a set of points to get a non-self-intersecting non-convex polygon

问题 I have an unordered set of 2D points which represents the corners of a building. I need to connect them to get the outline of the building. The points were obtained by combining different polygons collected by different individuals. My idea is to use these polygons to get the points in order (e.g. taking the region between the biggest and smallest polygons and connect the points such that it comes in this region). I tried using the minimum distance criteria and also to connect the points

How to integrate tabular data into GraphDB automatically?

阅读更多关于 How to integrate tabular data into GraphDB automatically?

问题 I want to import tabular (xls) data automatically into GraphDB. OntoRefine suits my case very will with the power of OpenRefine and SPARQL. Now, i am thinking about the following approach New tabular data is available as XLS file OntoRefine updates a project or creates a new project automatically SPARQL queries against RDFbridge to create new triples SPARQL Insert to add this triple Is there an alternative approach to automate it? If this is the best, how can i update or create a new