kettle

Pentaho DI - JSON Nested File Output

痴心易碎 提交于 2019-12-19 10:23:05
问题 I have a requirement where I need to fetch records from multiple tables. The primary table is having one-to-many relationship to other tables. My data source is Oracle DB. Oracle db is having the specified tables. One called Student other one is Subjects. For sample, I have a Student Table where "Student_Id" is the Primary Key and other columns like firstname, lastName etc. Each student have registered for multiple subjects so we have student_id is the foreign key to the Subjects table.

kettle 8.2

我的未来我决定 提交于 2019-12-18 04:39:38
安装环境简介.mp4 安装node-1虚拟机系统.mp4 使用SecureCRT连接操作系统.mp4 安装CDH安装环境所需依赖包.mp4 卸载Openjdk.mp4 关闭防火墙和安全防护.mp4 安装lrzsz.mp4 安装jdk.mp4 安装和配置并启动NTP服务.mp4 修改hosts文件.mp4 克隆出虚拟机node-2.mp4 克隆虚拟机node-3.mp4 修改node-1的内存信息、使用SecureCRT连接node-2和node-3.mp4 配置免密登录.mp4 mysql安装.mp4 mysql允许远程访问.mp4 创建Hive和amon数据库.mp4 在node-1上安装ClouderaManager01.mp4 在node-1上安装ClouderaManager02.mp4 cmserver和agent的启动.mp4 CDH的安装.vep.mp4 hadoop环境准备.mp4 kettle配置Hadoop的环境.mp4 Hadoopfileinput组件.mp4 Hadoopfileoutput.mp4 Hive数据的初始化.mp4 kettle配置Hive的环境.mp4 从hive中读取数据.mp4 把数据写入到hive.mp4 通过Hadoopcopyfiles作业组件把数据加载到hive数据库中.mp4 执行Hive的HiveSQL语句.mp4

Kettle Carte集群 在windows 上的部署与运行

流过昼夜 提交于 2019-12-15 09:11:46
本片文章主要是关于使用Kettle的UI界面: Spoon来实现基于集群的对数据库中的数据表数据进行排序的试验。 以及在实验过程中所要开启的Carte服务的一些配置文件的设置, 还有基于Windows cmd 的相关Carte命令。 文章主要分为六个部分: 1.介绍carte    2.carte相关配置文件的设定 3.carte服务的开启命令 4.在kettle的图形界面中对集群进行相关的设定    5.使用kettle集群模式对相关的数据进行排序 6.有关于集群调用子服务器的java源代码调用实现 1.介绍carte carte是由kettle所提供的web server的程序, carte也被叫做子服务器(slave) 在kettle调用集群(cluster)来进行分布式分发、处理任务的时候, 可以开启多个carte服务进程 来进行分发ETL(master)任务和接收,运行,提交ETL任务(slave)。 就像是《pentaho kettle solutions》中对Carte的定义: "Carte a lightweight server process allows for remote monitoring and enables the transformation clustering capabilities ". "Carte是一个轻量级的服务器进程

How to retrieve OUT parameter from MYSQL stored procedure to stream in Pentaho Data Integration (Kettle)?

一个人想着一个人 提交于 2019-12-14 00:24:14
问题 I am unable to get the OUT parameter of a MySQL procedure call in the output stream with the procedure call step of Pentaho Kettle. I'm having big trouble retrieving OUT parameter from MYSQL stored procedure to stream. I think it's maybe a kind of bug becouse it only occurs with Integer out parameter, it works with String out parameter. The exception I get is: Invalid value for getLong() - ' I think the parameters are correctly set as you can see in the ktr. You can replicate the bug in this

Pentaho Kettle 8 Kafka Consumer

情到浓时终转凉″ 提交于 2019-12-13 17:23:28
问题 I'm having some issues when I use the new Kafka consumer connector. I use it as documentation says, I have the connector alone in a transformation and I have a transformation following this one in a job, with a get records from stream step. The problem is that the first transformation, with kafka consumer, never finishes, and it's always running, without receiving nothing. 回答1: Follow me, like below screenshot: 来源: https://stackoverflow.com/questions/47828680/pentaho-kettle-8-kafka-consumer

最最最全数据仓库建设指南,速速收藏!!

北慕城南 提交于 2019-12-13 15:19:32
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 开讲之前,我们先来回顾一下数据仓库的定义。 数据仓库(Data Warehouse)是一个面向主题的、集成的、相对稳定的、反映历史变化的数据集合,用于支持管理决策。这个概念最早由数据仓库之父比尔·恩门(Bill Inmon)于1990年在《建立数据仓库》一书中提出,近年来却被愈发广泛的提及和应用,不信看下图: 到底是什么,让一个从上世纪90年代提出的概念,在近几年确越来越热?带着这个问题,我们来了解一下产业真实的变化。 根据统计局的数字显示,近年来数字经济总体规模占GDP的比重越来越高,截止2018年将近35%;数字经济增速与GDP增速的差距逐渐拉大,远高于同期GDP增速。 在 2014年,“新常态”一词被首次提出,指出从当前中国经济发展的阶段性特征出发,适应新常态,保持战略上的平常心态。意味着经济新常态下,要适应GDP从高速增长转变为中高速增长的态势,吃资源饭、环境饭、子孙饭的旧发展方式正在让位于以转型升级、生产率提高、创新驱动为主要内容的科学、可持续、包容性发展,从要素驱动、投资驱动转向服务业发展及创新驱动。 在新常态下,数据经济背后的信息化正催生数据发挥着巨大价值,未来也会一样。 在这样的背景下,“数据”、“数据分析”、“人工智能”、“IOT”这些行业关键词在百度指数搜索趋势一路攀升。而随着转型的深入

Does anybody know the list of Pentaho Data Integration (Kettle) connectors list?

旧巷老猫 提交于 2019-12-13 08:32:47
问题 I am doing comparison between three open source ETL tools Talend, Kettle and CloverETL. I could find with no problem Talend and CloverETL's connector list. But, I cannot find the one for Kettle. Does someone knows them or where can I find them ? Thanks a lot, 回答1: I assume by "connector" you mean input/output nodes and not intermediate transformations. Just looking through the Kettle GUI, I see: Inputs Access CSV De-serialize from file [GH: not sure what kind of file/serialization this means]

How to read all folders and subfolders from Pentaho Kettle Get files with SFTP step

一笑奈何 提交于 2019-12-13 04:59:04
问题 The "Get files with SFTP" step is able to fetch all the files from the specified source path (over the FTP). But it is not able to read any of the folders exists at the source path. I tried with the Reg-Exp wild cards like .* or * or . etc, but no use. In my use case, the source files always will come in one or multiple folders (like monthly transaction files in month specific folders, or year-wise and month-wise folders in multi-level folder hierarchy etc). If all these folders moved to my

Running pan.bat from command line

耗尽温柔 提交于 2019-12-13 04:57:18
问题 I'm trying to run pan.bat through cmd from my windows os system,I have set the environment variable PENTAHO_JAVA_HOME,seeking help for the same , Thanking in advance. I tried this command to run the .ktr C:\pdi-ce-5.2.0.0-209\data-integration>pan.bat /file:E:\Practise_TRANSFORMATION OUTPUT\dynamic pivot\trying_pivot_with_2_billingid.ktr /level:Basic and this is the error I'm getting WARNING: Using java from path DEBUG: _PENTAHO_JAVA_HOME= DEBUG: _PENTAHO_JAVA=java.exe C:\pdi-ce-5.2.0.0-209

Pentaho Data Integration setVariable not working

╄→尐↘猪︶ㄣ 提交于 2019-12-13 02:42:09
问题 I am on PDI 7.0 and have a "Modified Java Script Value" step inside a transformation as below: var numberOfDays = 100; Alert(numberOfDays); setVariable("NUMBER_OF_DAYS", numberOfDays, "r"); Alert(getVariable("NUMBER_OF_DAYS", "")); However, when I run the transformation, the first Alert correctly throws 100, but the next Alert is blank (meaning the variable is not set). What is wrong here? 回答1: As a rule of thumb , you should never set a variable and read it within the same transformation .