pentaho-spoon

pentaho spoon/pid: how to move files to folders with different name everytime?

穿精又带淫゛_ 提交于 2019-12-20 05:28:23
问题 I have new text files every month from where I extract the data and do some transformations. In the end of every month, I need to move these files to a folder with current date in name. Which means, the destination folder's name is different everytime. I made a step before move files that creates a folder and its name is current date (exp: 2019-06-01, 2019-07-01), but then on move files step, I don't know how to specify the destination folder. Guess "wildcard" is only used for source...

How to configure Database connection for production environment in Pentaho data integration Kettle transformation

喜夏-厌秋 提交于 2019-12-19 10:52:07
问题 I designed a ktr file for transformation. I need to configure the database connection details of production environment. How can I do this? Any suggestions? 回答1: I use environment variables. KETTLE_HOME KETTLE_JNDI_ROOT PATH=$PATH:$KETTLE_HOME Kettle home is just a link to directory. By default i have directory specially devoted to data-integration suite. It contains several versions of kettle. Example /opt/kettle/data-integration-4.4.0 (few old jobs made like several years ago) /opt/kettle

Pentaho DI - JSON Nested File Output

痴心易碎 提交于 2019-12-19 10:23:05
问题 I have a requirement where I need to fetch records from multiple tables. The primary table is having one-to-many relationship to other tables. My data source is Oracle DB. Oracle db is having the specified tables. One called Student other one is Subjects. For sample, I have a Student Table where "Student_Id" is the Primary Key and other columns like firstname, lastName etc. Each student have registered for multiple subjects so we have student_id is the foreign key to the Subjects table.

Fetch Data From Remote Database Every Hour

老子叫甜甜 提交于 2019-12-14 04:21:24
问题 Yesterday i Download Pentaho BI Server Data Integration Report Designer Than I connect report designer to the remote database and fetch table and draw chart of that data successfully. My Question is ,I want to run that file (which i create in report designing ) every hour by fetching the new data from remote database can you please guide me step by step how to do it because i am new in all those stuff. 回答1: I will answer My own Question. So In order to schedule Job in Data Integration You

Break string into columns using Regular Expression

陌路散爱 提交于 2019-12-13 23:53:12
问题 I am new in regex, i want to break the give string into 6 parts using regular expression. I am using the Pentaho data integration tool (ETL tool) Given string: 1x 3.5 mL SST. 1x 4.0 mL gray cap cryovial. Note: There are many more string with same format I want output as: Thanks in advance !! 回答1: The single string datum you've given looks like it should match the regex pattern: (\d*)x\s(\d*\.\d*)\smL\s(.*)\.\s(\d*)x\s(\d*\.\d*)\smL\s(.*)\. You can use it with Regex Evaluation step: 回答2: Use

Pentaho Data Integration setVariable not working

╄→尐↘猪︶ㄣ 提交于 2019-12-13 02:42:09
问题 I am on PDI 7.0 and have a "Modified Java Script Value" step inside a transformation as below: var numberOfDays = 100; Alert(numberOfDays); setVariable("NUMBER_OF_DAYS", numberOfDays, "r"); Alert(getVariable("NUMBER_OF_DAYS", "")); However, when I run the transformation, the first Alert correctly throws 100, but the next Alert is blank (meaning the variable is not set). What is wrong here? 回答1: As a rule of thumb , you should never set a variable and read it within the same transformation .

Merge Rows (diff) is comparing row by row, not one row to entire rows of other table

感情迁移 提交于 2019-12-12 20:07:02
问题 I am comparing two sheets using Merge Rows (diff). 1st excel sheet: 2nd excel sheet: and my pentaho transaction: in preview data showing, that id 2.0 at 2nd row is add new row and at 4 row its showing same data is delete, its suppose to identical, so how it can be achieve. 回答1: Merge rows (diff) requires both input streams to be sorted by the merge keys (there's a warning about it when you edit the step's properties). Put a sort rows step in each stream ahead of the Merge Rows (diff) step. 来源

Lookup values in Mongodb Pentaho Spoon

我是研究僧i 提交于 2019-12-12 10:11:41
问题 How can I lookup the values in Mongodb? I use stream lookup but i think it will have a performance issue when looking up on a collection with high volume of data. 回答1: Solution 1: Found this on the market place "mongodblookup" There is only one problem with the plug in, it doesn't return a record if the lookpup match fail. Solution 2: UJDC - 2 field from input stream - artist_id,translation (this is the identifier for the lookup) jsonColl - is a field in UJDC it will return null if no

Make a DB INSERT based on Text File Input metadata

此生再无相见时 提交于 2019-12-12 04:54:28
问题 I'm developing an ETL and must do some routines for monitoring it. At the begining, I must make in INSERT on DB to create a record informing the filename and starting process datetime. This query will return the record's PK and it must be stored. When the ETL of that file finishes, I must update that record informing the ETL finished with success and its ending process datetime. I use Text File Input to look for files that match its regex, and add its "Additional output fields" to stream. But

Fetching the max value from ROWS in pentaho

一笑奈何 提交于 2019-12-12 03:23:49
问题 I have a table structure ID Col_1 col_2 col_3 col_4 1 34 23 45 32 2 20 19 67 18 3 40 10 76 86 I here want the max value from col_1,col_,col_3,col_4 so my output looks like ID Col_1 col_2 col_3 col_4 max 1 34 23 45 32 45 2 20 19 67 18 67 3 40 10 76 86 86 any help would be much appreciated. 回答1: Use a Modified Java Script Value step with the following code: var max = Math.max(col_1,col_2,col_3,col_4); 回答2: You can use Memory Group By or Group By steps in Pentaho. Use the aggregation method as