apache-nifi

Apache NiFi Replace Text processor to use control character as delimiter

南楼画角 提交于 2019-12-25 01:43:42
问题 Using replace text processor while converting fixed width file to delimited with normal character like ';' , '|' ,',' as delimiters is working. However considering \u0001 or [^]A or \^A is not working as expected. 回答1: to use special chars you could use literal + unescapeXml nifi expression functions: ${literal(''):unescapeXml()} 来源: https://stackoverflow.com/questions/56218104/apache-nifi-replace-text-processor-to-use-control-character-as-delimiter

ETL Tools that function well with ArangoDB - What are they?

折月煮酒 提交于 2019-12-25 01:33:50
问题 There are so many ETL tools out there. Not many that are Free. And of the Free choices out there they don't appear to have any knowledge of or support for ArangoDB. If anyone has dealt with the migration of their data over to ArangoDB and automated this process I would love to hear how you accomplished this. Below I have listed out several choices we have for ETL Tools. These choices I actually took from the 2016 Spark Europe presentation by Bas Geerdink. * IBM InfoSphere DataStage * Oracle

how to improve nifi performance when sync data in mysql

核能气质少年 提交于 2019-12-24 23:24:17
问题 I use NiFi(one instance) CaptureMySQLChange(binlog) + EvaludateJsonPath + JoltTransformJSON + PutDatabaseRecord to sync data from one table to another, both tables are on different databases but on same mysql instance. I use insert into table_a select * from table_b limit 5000; to batch insert 5000 rows, nifi takes about 7 minutes to sync all 5000 rows, is it normal or slow for nifi? If slow, how should I do to improve performance? jvm setting: java.arg.2=-Xms4g java.arg.3=-Xmx8g Processor

Replace a value with a variable from flowfile using apache-nifi

戏子无情 提交于 2019-12-24 23:19:22
问题 I am trying to replace a value with a variable assigned in flowfile. In my flowfile, I have assigned flowID to flow_id variable. In UpdateRecord processor, I try to update a column named /flow which has INFLOW and OUTFLOW I have following as ${field.value:replaceAll('INFLOW',$flow_id)} Flowfile before UpdateRecord : id,flow,flow_id 1,INFLOW,IN 2,OUTFLOW,OUT 3,INFLOW,IN After the conversion flowfile should be: id,flow,flow_id 1,IN,IN 2,OUT,OUT 3,IN,IN But it fails with an error unexpected

Apache-Nifi : Delete MongoDB collections

偶尔善良 提交于 2019-12-24 22:00:45
问题 I want to delete some collections : db['mycollection'].remove({}) ,of my MongoDB database, i found out that there is a deleteMongo processor but i don't know how to use it since i can't find examples. So, does deleteMongo processor allow to do this? if it does, can you show me an example please. Thanks in advance! 回答1: The DeleteMongo doesn't actually delete collections, rather it deletes the documents in the provided collection. Take a look at the processor's documentation here. It expects

Missing flowfile exception on Nifi processing cause loss of information

自闭症网瘾萝莉.ら 提交于 2019-12-24 21:59:55
问题 During an ETL process, we had random exception that causes loss of flowfile. Nifi is deployed on 3 nodes Kubernetes cluster with repositories on shared file-system (GlusterFS). We did some stress test and on 2000 files csv being processed almost 10% get lost with the exception reported. We tried also to scale down to one node and setting the number of parallel threads to 1 in order to minimize parallelism problems on the incriminated processors (validatecsv and validatejsonpath). It seems

MiNiFi - NiFi Connection Failure: Unknown Host Exception : Able to telnet host from the machine where MiNiFi is running

試著忘記壹切 提交于 2019-12-24 21:00:26
问题 I am running MiNiFi in a Linux Box (gateway server) which is behind my company's firewall. My NiFi is running on an AWS EC2 cluster (running in standalone mode). I am trying to send data from the Gateway to NiFi running in AWS EC2. From gateway, I am able to telnet to EC2 node with the public DNS and the remote port which I have configured in the nifi.properties file nifi.properties # Site to Site properties nifi.remote.input.host=ec2-xxx.us-east-2.compute.amazonaws.com nifi.remote.input

How to extract and route only specified columns from a CSV files and drop all other columns [duplicate]

ⅰ亾dé卋堺 提交于 2019-12-24 11:44:11
问题 This question already has answers here : How to extract a subset from a CSV file using NiFi (2 answers) Closed last year . I want to extract few fields along with its value from a CSV file and drop/delete all other fields in the file. Please help. I think we can use RoutText processor.Please tell me how to write the regular expression for the routing only specified fields and dropping everything else. Thanks Example- from he snapshot attached I only want to route 'Firstname,Lastname and

How to pass flow files to the Execute Python script and use attributes & Nifi variables to store that file?

醉酒当歌 提交于 2019-12-24 11:27:51
问题 I am a rookie at both NiFi and Python and I need your help to pass the Flow File attribute value to the script. The script is converting a nested json into csv. When I run the script locally it works. How can I pass the FlowFile name to src_json and tgt_csv? Thanks, Rosa import pandas as pd import json from pandas.io.json import json_normalize src_json = "C:/Users/name/Documents/Filename.json" tgt_csv = "C:/Users/name/Documents/Filename.csv" jfile = open(src_json) jdata = json.load(jfile) ..

Jolt Transformation - Match Values in Separate Branches - JSON

隐身守侯 提交于 2019-12-24 10:59:18
问题 I want to achieve the following JSON transformation using Jolt processor in Nifi Input JSON { "topLevel": { "secondLevelA": { "thirdLevelA": [ { "norsemen": "huntinSouth", "value": "AAA" }, { "norsemen": "huntinNorth", "value": "BBB" } ] }, "secondLevelB": { "thirdLevelB": [ { "norsemen": "huntinNorth", "oddCode": "AAA301" }, { "norsemen": "huntinNorth", "oddCode": "BBB701" }, { "norsemen": "huntinWest", "oddCode": "AAA701" } ] } } } Output JSON { "NAME": [ { "norsemen": "huntinSouth", "value