kettle | 易学教程

How to run different sql to get data according to the previous input data in pentaho kettle

阅读更多关于 How to run different sql to get data according to the previous input data in pentaho kettle

问题 I use pentaho kettle 8.2 in Windows 10 and db is Oracle, now i have a requirement and don't know how to realize this function. My requirement is that: step 1: get data 1 from db; step 2: get data 2 from different table(sql) according to the field of step 1's data 1; step 3: update other db according data 2 in step 2. Step 1 is easy to get data from one db, in step 2, i try to get data based on step 1's output, i use Switch/case to judge step 1's result and then use different SQL script ,

数据集成工具Kettle、Sqoop、DataX的比较

阅读更多关于数据集成工具Kettle、Sqoop、DataX的比较

数据集成工具很多，下面是几个使用比较多的开源工具。 1、阿里开源软件：DataX DataX 是一个异构数据源离线同步工具，致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。 2、Apache开源软件：Sqoop Sqoop(发音:skup)是一款开源的工具，主要用于在HADOOP(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递，可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中，也可以将HDFS的数据导进到关系型数据库中。（摘自百科） 3、Kettle开源软件：水壶（中文名） Kettle是一款国外开源的ETL工具，纯java编写，可以在Window、Linux、Unix上运行，绿色无需安装，数据抽取高效稳定。 Kettle 中文名称叫水壶，该项目的主程序员MATT 希望把各种数据放到一个壶里，然后以一种指定的格式流出。 Kettle这个ETL工具集，它允许你管理来自不同数据库的数据，通过提供一个图形化的用户环境来描述你想做什么，而不是你想怎么做。 Kettle中有两种脚本文件，transformation和job，transformation完成针对数据的基础转换

Dynamic naming of excel sheets using pentaho kettle

阅读更多关于 Dynamic naming of excel sheets using pentaho kettle

问题 I have a transformation with sequential steps of writing data from a table input step to Excel sheets using Excel writer step. The sheet names are basically provided in the sheet name box in content tab which was reflecting in the spreadsheet. Instead of per-defining the sheet name, is there any possibility that the sheet names can be dynamically taken from column value of the table. Ex: say there is a table section and columns section_name , stud_name so i need to show section names as excel

Run PDI Jobs using Web Services

阅读更多关于 Run PDI Jobs using Web Services

问题 I have a job created using spoon and imported to the DI repository. Without scheduling it using PDI job scheduler how can I run PDI Job on a Data Integration Server using REST web services? So that I can call it whenever I want. 回答1: Before beginning these steps, please make sure that your Carte server (or Carte server embedded in the DI server) is configured to connect to the repository for REST calls. The process and description can be found on the wiki page. Note that the repositories.xml

Clone and Build Pentaho Kettle

阅读更多关于 Clone and Build Pentaho Kettle

问题 Sorry for basic question, but I have been trying for a while and cannot get anywhere with this. Any one have experience clone the Pentaho-kettle project and import it into eclipse? I follow the instruction from https://github.com/pentaho/pentaho-kettle. I did the following cd pentaho-kettle ant clean-all resolve create-dot-classpath Then I go into eclipse and Import Existing Project into workspace. Note that I am importing from the root folder. Should I include the option scan the nested

How to uncompress and import a .tar.gz file in kettle?

阅读更多关于 How to uncompress and import a .tar.gz file in kettle?

问题 I am trying to figure out how to create a job/transformation to uncompress and load a .tar.gz file. Does anyone have any advice for getting this to work? 回答1: you want to read a text file that is compressed? Just specify the file in the text file input step in the transformation - and specify the compression (GZip). Kettle can read directly from compressed files. If you do need the file uncompressed then use a job step - not sure if there is a native uncompress, but if not just use a shell

Pentaho Frame size (17727647) larger than max length (16384000)!

阅读更多关于 Pentaho Frame size (17727647) larger than max length (16384000)!

问题 In pentaho , when I run a cassandra input step that get around 50,000 rows , I get this exception : Is there a way to control the query result size in pentaho ? or is there a way to stream the query result and not get it all in bulk? 2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Unexpected error 2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di

kettle ETL工具部署

阅读更多关于 kettle ETL工具部署

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 官网下载： http://kettle.pentaho.org java环境 windows环境变量变量名：JAVA_HOME 变量值：C:\Program Files\Java\jdk1.8.0_231 变量名：CLASSPATH 变量值：.;%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar; 变量名：Path 变量值：%JAVA_HOME%\bin;%JAVA_HOME%\jre\bin; 配置kettle环境变量变量名：KETTLE_HOME 变量值：D:\data-integration 执行文件 D:\data-integration\Spoon.bat # 双击win批处理文件来源： oschina 链接： https://my.oschina.net/attacker/blog/3141597

pentaho create archive folder with MM-YYYY

阅读更多关于 pentaho create archive folder with MM-YYYY

问题 I would like to archive every file in a folder by putting it in another archive folder with a name like this: "Archive/myfolder-06-2014" My problem is how to retrieve the current month and year and then how to create a folder (if it does not already exist) with these data. 回答1: This solution may be a little awkward (due to the required fuss) but it seems to work. The idea is to precompute the target filename in a seperate transformation and store it as a system variable ( TARGET_ZIP_FILENAME

Angry org.pentaho.di.core.exception.KettleMissingPluginsException in Step : JmsOutput Why?

阅读更多关于 Angry org.pentaho.di.core.exception.KettleMissingPluginsException in Step : JmsOutput Why?

问题 I made a Transform to send JMS Produce to ActiveMQ . but during executing the transform via my Java Client Application including PDI Jars. I faced this Error : SEVERE: null org.pentaho.di.core.exception.KettleMissingPluginsException: Missing plugins found while loading a transformation Step : JmsOutput at org.pentaho.di.trans.TransMeta.loadXML(TransMeta.java:2840) at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2676) at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2628) at org