apache-nifi

Convert a CSV file to JSON using Apache NiFi

青春壹個敷衍的年華 提交于 2019-12-01 13:49:25
I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing? Input: 1,aaa,loc1 2,bbb,loc2 3,ccc,loc3 and my nifi workflow is as here: http://www.filedropper.com/mycsvtojson My output is as below which is desired format but I want that to happen for all the rows. { "id" : "1", "name" : "aaa", "location" : "loc1" } There are a few different ways this could be done... A custom Java processor that reads in a

Convert a CSV file to JSON using Apache NiFi

纵饮孤独 提交于 2019-12-01 12:22:58
问题 I am trying to read a csv from local file system and convert the content into JSON format using Apache Nifi and put the JSON format file in the local system. I have succeeded in converting the first row of csv file but not other rows. What am I missing? Input: 1,aaa,loc1 2,bbb,loc2 3,ccc,loc3 and my nifi workflow is as here: http://www.filedropper.com/mycsvtojson My output is as below which is desired format but I want that to happen for all the rows. { "id" : "1", "name" : "aaa", "location"

Create a Postgresql table from Avro Schema in Nifi

眉间皱痕 提交于 2019-12-01 07:28:15
问题 Using InferAvroSchema I got an Avro Schema of my file. I want to create a table in PostregSql using this Avro schema. Which processor I have to use. I use : GetFile->InferAvroSchema-> I want to create a table from this schema -> Put databaseRecord. The avro schema : { "type" : "record", "name" : "warranty", "doc" : "Schema generated by Kite", "fields" : [ { "name" : "id", "type" : "long", "doc" : "Type inferred from '1'" }, { "name" : "train_id", "type" : "long", "doc" : "Type inferred from

NiFi - how to reference a flowFile in ExecuteStreamCommand?

无人久伴 提交于 2019-12-01 05:11:49
I need to execute something like: sed '1d' simple.tsv > noHeader.tsv which will remove first line from my big flow file (> 1 GB). The thing is - I need to execute it on my flow file, so it'd be: sed '1d' myFlowFile > myFlowFile Question is: how I should configure the ExecuteStreamCommand processor so that it runs the command on my flow file and returns it back to my flow file? If sed is not a best option, I can consider doing this other way (e.g. tail) Thanks, Michal Edit 2 (Solution): Below is the final ExecuteStreamCommand config that does what I need (remove 1st line from the flow file).

Nifi: how to write Custom processor

偶尔善良 提交于 2019-12-01 01:30:35
I want to write nifi processor which can read xml file from hdfs directory and then extracting it's data into flowfile attributes , also if there is case when two nifi processor can get this file and read data or write something into it how can i do file lock so that at a time only one processor can use it? Can you reccomend me any article, code examples or some related materials which can help me. i'haven't write any custom processor yet. I'm not sure why you need to write a custom processor in this case, because both GetHDFS and EvaluateXPath processors exist and should be able to perform

Exception 'Cannot get a connection, pool error Timeout waiting for idle object' when using 'DBCPConnectionPoolLookup' service in Nifi

狂风中的少年 提交于 2019-11-30 21:18:55
问题 I'm trying to use 'DBCPConnectionPoolLookup' service in 'ExecuteGroovyScript' to dynamically query the required database based on 'database.name' parameter in the input flow file. The processor is successfully able to get the corresponding 'DBCPConnectionPool' service for querying but I'm getting the an exception java.sql.SQLException: Cannot get a connection, pool error Timeout waiting for idle object . As opposed to if I directly use the 'DBCPConnectionPool' service without the 'Lookup'

Difference between Apache NiFi and StreamSets

两盒软妹~` 提交于 2019-11-29 20:08:06
I am planning to do a class project and was going through few technologies where I can automate or set the flow of data between systems and found that there are couple of them i.e. Apache NiFi and StreamSets ( to my knowledge ). What I couldn't understand is the difference between them and use-cases where they can be used? I am new to this and if anyone can explain me a bit would be highly appreciated. Thanks Suraj, Great question. My response is as a member of the open source Apache NiFi project management committee and as someone who is passionate about the dataflow management domain. I've

Is Nifi having batch processing?

拟墨画扇 提交于 2019-11-29 16:52:05
i just need to know is this possible to run serious of processors untill its completion. "the execution of a series of processors in process group wait for anthor process group results execution to be complete". For example: i having 3 processors in Nifi UI. P1-->P2-->P3 P-->Processor Now i need to run p1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete. EDIT-1: just for example I have data in web url.i can download that data using GetHTTP Processor now i stored that in putFile content.If file saved in putFile directory then run

How to join two CSVs with Apache Nifi

时光怂恿深爱的人放手 提交于 2019-11-29 07:51:09
I'm looking into ETL tools (like Talend) and investigating whether Apache Nifi could be used. Could Nifi be used to perform the following: Pick up two CSV files that are placed on local disk Join the CSVs on a common column Write the joined CSV to disk I've tried setting up a job in Nifi, but couldn't see how to perform the join of two separate CSV files. Is this task possible in Apache Nifi? It looks like the QueryDNS processor could be used to perform enrichment of one CSV file using the other, but that seems to be over-complicated for this use case. Here's an example of the input CSVs,

Difference between Apache NiFi and StreamSets

旧时模样 提交于 2019-11-28 15:50:14
问题 I am planning to do a class project and was going through few technologies where I can automate or set the flow of data between systems and found that there are couple of them i.e. Apache NiFi and StreamSets ( to my knowledge ). What I couldn't understand is the difference between them and use-cases where they can be used? I am new to this and if anyone can explain me a bit would be highly appreciated. Thanks 回答1: Suraj, Great question. My response is as a member of the open source Apache