apache-nifi

NiFi | Flow file movement withing processor

血红的双手。 提交于 2019-12-12 03:45:05
问题 I have been reading about NiFi and have few queries . Consider a use case where I want to move data into HDFS from local. I will use getFile and putHDFS processor. So when I pass location to getFile , it will pick up data and will move into content repository and further it will pass to putHDFS processor for ingestion. Question: I have seen flow file content is a byte representation , does byte conversion is done by Nifi ?( If my source file is text file)? How data is moved to HDFS from

Cannot get a connection, pool error Timeout waiting for idle object in PutSQL?

↘锁芯ラ 提交于 2019-12-12 03:35:38
问题 I have increased the concurrent tasks to be '10' for PutSQL processor. At that time it shows below error but there is no data loss. failed to process due to org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object; rolling back session: if i have remove concurrent tasks then it worked without those exception while google this exception i have found answer in below link I am getting

How to order files in NiFi based on a timestamp in the filename?

混江龙づ霸主 提交于 2019-12-11 18:34:09
问题 I'm listing files from a directory, then parsing a timestamp out of their filename. I then need to process the files in order. I tried to use the EnforceOrder processor, but it's not designed to enforce order when there are large numerical gaps between the elements, like there would be in timestamps. The Priority Attribute Prioritizer has other issues... It processes higher priorities first, so it would process them in reverse. I dont see that it has any ability to queue for a certain period

How to execute a processor only when another processor is not executing?

依然范特西╮ 提交于 2019-12-11 17:39:53
问题 I am inserting/updating data into a table. The database system does not provide an "Upsert" functionality. Thus I am using a staging table for the insert followed by a merge into the "final" table and finally I am truncating the staging table. This leads to a race condition. If new data is inserted into the staging table between the merge+truncate this data is lost. How can I make sure this does not happen? I have tried to model this via Wait/Notify, but this is not a clean solution either.

How to reorder CSV columns in Apache NiFi

限于喜欢 提交于 2019-12-11 17:23:48
问题 Reorder column in a csv in apache nifi. Input - I have multiple files which have same columns but are in different order. Output - Scrape some columns and store in same order. 回答1: In my case, because I'm sure those columns will be included in all CSV files, I just need to reorder them. So I use QueryRecord to reorder my csv files. For example, here're my csv files: \\file1 name, age, location, gender Jack, 40, TW, M Lisa, 30, CA, F \\file2 name, location, gender, age Mary, JP, F, 25 Kate, DE

NiFi execute script encrypt json

浪尽此生 提交于 2019-12-11 16:19:36
问题 Hi referring to this question: nifi encrypt json I have tried using the template provided. I found an error when it tries to execute the executeScript processor: Without the try catch: Basically it tries to execute the following script: import javax.crypto.Cipher import javax.crypto.SecretKey import javax.crypto.spec.IvParameterSpec import javax.crypto.spec.SecretKeySpec import java.nio.charset.StandardCharsets FlowFile flowFile = session.get() if (!flowFile) { return } try { // Get the raw

Apache NiFi For Importing Data From RDMBS to HDFS - Performance Comparison with SQOOP

删除回忆录丶 提交于 2019-12-11 16:03:45
问题 We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements. One typical data ingestion requirement is moving data from RDBMS systems to HDFS. I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables. But, I couldn't test the flow for bigger tables as I was using a standalone distribution. Has anyone done a performance comparison of

Using Apache NiFi to write CSV files by contents of column

狂风中的少年 提交于 2019-12-11 15:49:20
问题 I have an Apache NiFi flow, where I read in a massive .csv file. Here's a sample .csv : school, date, city Vanderbilt, xxxx, xxxx Georgetown, xxxx, xxxx Duke, xxxx, xxxx Vanderbilt, xxxx, xxxx I want to use NiFi to read the file, and then output another .csv file by school name. I.e. there would be a .csv file of two Vanderbilt records (two lines total, b/c two records), and one file for Georgetown , and one file for Duke . I've used GetFile to draw in my file (works, verified), and then

Why is ExecuteSQLRecord taking a long time to start outputting flow files on large tables?

社会主义新天地 提交于 2019-12-11 14:55:58
问题 I am using the ExecuteSQLRecord processor to dump the contents of a large table (100 GB) with 100+ million records. I have set up the properties like below. However, what I am noticing is that it takes a good 45 minutes before I see any flow files coming out of this processor? What am I missing? I am on NiFi 1.9.1 Thank you. 回答1: An alternative to ExecuteSQL(Record) or even GenerateTableFetch -> ExecuteSQL(Record) is to use QueryDatabaseTable without a Max-Value Column. It has a Fetch Size

Nifi-Loading XML data into Cassandra

依然范特西╮ 提交于 2019-12-11 14:42:58
问题 I am trying to insert XML data into Cassandra DB. Please can somebody suggest the flow in nifi. I have JMS on which I need to post messagedata and then consume & insert the data into Cassandra. 回答1: I'm not sure if you can directly ingest XML into Cassandra. However you could convert the XML to JSON using the TransformXml processor (and this XSLT), or as of NiFi 1.2.0, you can use ConvertRecord by specifying the input and output schemas. If there are multiple XML records per flow file and you