kettle | 易学教程

Running Flows in Mule Parallel

阅读更多关于 Running Flows in Mule Parallel

问题 I have two flows in Mule that i want to run in parallel. The first flow should transfer a file from a remote machine using sftp to a local directory (does this none stop as long as the file is constantly updated in the remote directory). The second flow must take the data in the file at update/insert them into the database by invoking a Pentaho kettle transformation/job (also continuous process as long as the files keep coming in). However, when i run my flow, it is somehow by passing the

Is there a way to check for existence of a folder in pentaho?

阅读更多关于 Is there a way to check for existence of a folder in pentaho?

问题 I know that there is a "Check if a folder is empty", but it does not check for existing of the folder. 回答1: But to use it in Pentaho is more complicated. When creating a Job rather than a transform, straight Java is not directly available (that I know of). The good news is PDI's JavaScript interpreter is Rhino. That means all Java's objects and classes are available to JavaScript. As such the check is pretty easy. Add a variable or parameter in your job and call it something like dirpath and

Changing date format in Pentaho using javascripting

阅读更多关于 Changing date format in Pentaho using javascripting

问题 I have an input excel sheet which has a field "fail_date". I want to change the format to dd.MM.yyyy HH:mm:ss . I am doing this in javascript shown below. var temp = fail_date.getDate(); str2date(temp,"dd.MM.yyyy HH:mm:ss"); But I get the below error when i run 2015/05/07 17:48:01 - Modified Java Script Value 2 2 2.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Could not apply the given format dd.MM.yyyy on the string for Thu Jan 01 11:05:50 IST 1970 :

KETTLE多表关联的同步一张表的两种实现方式

阅读更多关于 KETTLE多表关联的同步一张表的两种实现方式

以下操作都在5.0.1版本下进行开发，其余版本可以进行自动比对在平时工作当中，会遇到这种情况，而且很常见。比如：读取对方的多个视图或者表，写入目标库的一张表中，就涉及到多表的同步。多表同步可以有以下两种方式实现，接下来笔者就给大家分别介绍下实现方式：方式一：多表关联查询后，写入一张表 1、根据这两张表的数据，表有学生表和班级表，写入目标表：学生班级表 2、选择表输入，双击表输入，在sql一栏里写入关联sql语句，点击预览后，查询出相关数据，点击确定 3、在核心对象中，选择表输出，按住shift键，鼠标连接表输入和表输出，双击表输出，选择数据库连接，选择目标表和提交数量点击确定 4、点击运行按钮，写入成功 5、验证成功，数据已经写入相关表方式二：如图所示，需要创建表输入1（学生），表输入2（班级），记录集连接（连接学生和班级表），表输出（写入目标表） 1、表输入1，获取相应的写入字段 2、表输入2，获取班级相应的写入字段 3、表输入1和表输入2，同时按住shift连接记录集连接控件后，点击记录集连接控件，输入连接字段1和连接字段2，连接类型选择inner(数据库相关知识) 4、最后连接表输出，选择数据库字段后（必须和目标表的表结构一致），点击确认 5、双击运行，执行完毕，到数据库验证通过链接： https://pan.baidu.com/s

Pentaho DI Send Mail. Read timed out

阅读更多关于 Pentaho DI Send Mail. Read timed out

问题 I am trying to send an email from my Gmail account. Bellow are the SMTP details that I provided. Server: smtp.gmail.com Port : 465 (also tried 587) Use Authentication: Yes Authentication User: my full email id Authentication password: my password Use Secure Authentication: Yes Secure Connection Type: SSL This is the error that I am getting. 2016/03/16 17:35:45 - [ftp-poc].Mail - ERROR (version 5.2.0.0, build 1 from 2014-09-30_19-48-28 by buildguy) : Problem while sending message: javax.mail

Assigning an integer value to an output row, pentaho

阅读更多关于 Assigning an integer value to an output row, pentaho

问题 I'm using kettle in a very basic way. What I want to do is read from csv file, do some kind of transformation in User Defined Java Class step and write output to a text file. a picture http://imageshack.com/a/img34/1669/vo18.png When I run this I essentially get this error: value Integer<binary-string> : There was a data type error: the data type of java.lang.Long object [100] does not correspond to value meta [Integer<binary-string>] This is the line in UDJC step that seems to make the

Kettle Internal.Job.Filename.Directory

阅读更多关于 Kettle Internal.Job.Filename.Directory

问题 I am new to Pentaho Kettle and I am wondering what the Internal.Job.Filename.Directory is? Is it my SPoon.bat folder, or the job/xfrm folder i created? Is there a way I can change it to point to particular folder? I am runnig spoon.bat in Windows XP. 回答1: Internal.Job.Filename.Directory is only set when you don't use a repository , and it is set automatically. You cannot set it manually. How not to use an repository? When you start Spoon, you get a dialog which asks for a repository. Just

Stop running Kettle Job/Transformation using Java

阅读更多关于 Stop running Kettle Job/Transformation using Java

问题 I'm developing a web-app based ETL too (with Kettle engine), using Java. I'm running into issues, while trying to stop a running Job. I'm not sure if using the CarteSingleton.java is correct. I'm using a custom singleton map. My code is as below Job job = new Job(null, jobMeta); job.setLogLevel(LogLevel.DETAILED); job.setGatheringMetrics(true); job.start(); Once job.start() is invoked, I'm trying to store that job object in a custom singleton map and retrieve the exact Job object that was

Python爬虫数据处理

阅读更多关于 Python爬虫数据处理

一、首先理解下面几个函数设置变量 length()函数 char_length() replace() 函数 max() 函数 1.1、设置变量 set @变量名=值 set @address='中国-山东省-聊城市-莘县'; select @address 1.2 、length()函数 char_length()函数区别 select length('a') ,char_length('a') ,length('中') ,char_length('中') 1.3、 replace() 函数和length()函数组合 set @address='中国-山东省-聊城市-莘县'; select @address ,replace(@address,'-','') as address_1 ,length(@address) as len_add1 ,length(replace(@address,'-','')) as len_add2 ,length(@address)-length(replace(@address,'-','')) as _count etl清洗字段时候有明显分割符的如何确定新的数据表增加几个分割出的字段计算出com_industry中最多有几个 - 符以便确定增加几个字段最大值+1 为可以拆分成的字段数此表为3 因此可以拆分出4个行业字段

Kettle: Multiple putRows() in processRow() correctly?

阅读更多关于 Kettle: Multiple putRows() in processRow() correctly?

问题 I'm processing a /etc/group file from a system. I load it with CSV input step with the delimiter : . It has four fields: group , pwfield , gid , members . The members field is a comma separated list with account names of unspecified count from 0 to infinite. I would like to produce a list of records with three fields: group , gid , account . In the first step I use User Defined Java Class , in the second I use Select values . Example Input: root:x:0: first:x:100:joe,jane,zorro second:x:101