apache-nifi

NiFi - how to reference a flowFile in ExecuteStreamCommand?

a 夏天 提交于 2019-12-19 07:05:30
问题 I need to execute something like: sed '1d' simple.tsv > noHeader.tsv which will remove first line from my big flow file (> 1 GB). The thing is - I need to execute it on my flow file, so it'd be: sed '1d' myFlowFile > myFlowFile Question is: how I should configure the ExecuteStreamCommand processor so that it runs the command on my flow file and returns it back to my flow file? If sed is not a best option, I can consider doing this other way (e.g. tail) Thanks, Michal Edit 2 (Solution): Below

NiFi - how to reference a flowFile in ExecuteStreamCommand?

心不动则不痛 提交于 2019-12-19 07:05:29
问题 I need to execute something like: sed '1d' simple.tsv > noHeader.tsv which will remove first line from my big flow file (> 1 GB). The thing is - I need to execute it on my flow file, so it'd be: sed '1d' myFlowFile > myFlowFile Question is: how I should configure the ExecuteStreamCommand processor so that it runs the command on my flow file and returns it back to my flow file? If sed is not a best option, I can consider doing this other way (e.g. tail) Thanks, Michal Edit 2 (Solution): Below

Is Nifi having batch processing?

强颜欢笑 提交于 2019-12-18 09:23:28
问题 i just need to know is this possible to run serious of processors untill its completion. "the execution of a series of processors in process group wait for anthor process group results execution to be complete". For example: i having 3 processors in Nifi UI. P1-->P2-->P3 P-->Processor Now i need to run p1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete. EDIT-1: just for example I have data in web url.i can download that data using

Apache NIFi MergeContent processor - set demarcator as new line

白昼怎懂夜的黑 提交于 2019-12-18 05:08:33
问题 I want to use MergeContent processor to merge tweets to bulk insert into Elasticsearch index. For this I need command and tweets (each) to be separated by \n. This is how it should look like { action: { metadata }}\n { request body }\n Which will be { “index”} { tweet1 } { tweet2 } When I put \n as separator the processor actually adds \n as a string instead of new line separator. Is it possible to make it actual new line? Also is it possible to leave or make footer empty? Thanks in advance.

How to join two CSVs with Apache Nifi

爷,独闯天下 提交于 2019-12-18 05:06:25
问题 I'm looking into ETL tools (like Talend) and investigating whether Apache Nifi could be used. Could Nifi be used to perform the following: Pick up two CSV files that are placed on local disk Join the CSVs on a common column Write the joined CSV to disk I've tried setting up a job in Nifi, but couldn't see how to perform the join of two separate CSV files. Is this task possible in Apache Nifi? It looks like the QueryDNS processor could be used to perform enrichment of one CSV file using the

Import Modules in Nifi ExecuteScript

雨燕双飞 提交于 2019-12-17 16:46:19
问题 I am new to Nifi and python i want to execute my python script. So used ExecuteScript and tried to import certain modules. I have imported like this: import json, sftp, paramiko Though i have sftp installed, When i import it in Executescript, it says "Failed to process session. No module named sftp at line number 1" which -a sftp /usr/bin/sftp When importing paramiko also, got the same error. 回答1: The "python" engine used by ExecuteScript and InvokeScriptedProcessor is actually Jython, not

Kafka Avro Consumer with Decoder issues

送分小仙女□ 提交于 2019-12-17 16:38:06
问题 When I attempted to run Kafka Consumer with Avro over the data with my respective schema,it returns an error of "AvroRuntimeException: Malformed data. Length is negative: -40" . I see others have had similar issues converting byte array to json, Avro write and read, and Kafka Avro Binary *coder. I have also referenced this Consumer Group Example, which have all been helpful, however no help with this error thus far.. It works up until this part of code (line 73) Decoder decoder =

Return value of for() loop as if it were a function in R

送分小仙女□ 提交于 2019-12-14 04:07:27
问题 I have this for loop in an R script: url <- "https://example.com" page <- html_session(url, config(ssl_verifypeer = FALSE)) links <- page %>% html_nodes("td") %>% html_nodes("tr") %>% html_nodes("a") %>% html_attr("href") base_names <- page %>% html_nodes("td") %>% html_nodes("tr") %>% html_nodes("a") %>% html_attr("href") %>% basename() for(i in 1:length(links)) { site <- html_session(URLencode( paste0("https://example.com", links[i])), config(ssl_verifypeer = FALSE)) writeBin(site$response

ExecuteSQL processor returns corrupted data

末鹿安然 提交于 2019-12-14 03:06:16
问题 I have a flow in NiFI in which I use the ExecuteSQL processor to get a whole a merge of sub-partitions named dt from a hive table. For example: My table is partitioned by sikid and dt . So I have under sikid=1, dt=1000 , and under sikid=2, dt=1000 . What I did is select * from my_table where dt=1000 . Unfortunately, what I've got in return from the ExecuteSQL processor is corrupted data, including rows that have dt=NULL while the original table does not have even one row with dt=NULL. The

custom encryption or decryption algorithm for DBCPConnectionPool processor in NIFI

旧巷老猫 提交于 2019-12-13 23:57:12
问题 We are trying to provide custom encryption and decryption algorithm for the password in DBCPConnectionPool controller (build-in processor) in NIFI, instead of build-in algorithms. do we have any approach for that ? 回答1: If a processor, controller service, or reporting task has a PropertyDescriptor that is marked as sensitive [1] then NiFi automatically encrypts this value when writing it to the flow.xml.gz, and automatically decrypts it when reading the flow.xml.gz. The key and algorithm for