问题
I need to use PrioritizeAttributePrioritizer in NiFi.
i have observed that prioritizers in below reference. https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#settings
if i receive 10 flowfiles then i need to set the priority value for every flow file to be unique.
After that specify queue configuration must be PrioritizeAttributePrioritizer.
Then processing flowfiles based on priority value.
How can i set priority value for seperate flow files or which prioritizer in Nifi to be work for my case?
回答1:
If the files are named after the time they have been generated (e.g. file_2017-03-03T010101.csv
), have you considered using UpdateAttributes
to parse the filename into a date, that date into Epoch (which happens to be an increasing number) as a first level index / prioritizer?
This way you could have:
GetFile (single thread)
-- Connector with FIFO
--> UpdateAttribute (adding Epoch from filename date)
-- Connector with PriorityAttributePrioritizer
--> rest of your flow
Assuming the file name is file_2017-03-03T010101.csv
, the expression language would be something like:
${filename:toDate("'file_'yyyy-MM-dd'T'HHmmss'.csv'", "UTC"):toNumber()}
回答2:
The PriorityAttributePrioritizer prioritizes flow files by looking for a flow file attribute named "priority" and sorting the flow files lexicographically based on the value of the priority.
You can set the priority attribute using an UpdateAttribute processor. For example, if you had three logical data feeds, and feed #1 was most important, feed #2 was second most important, and feed #3 was third, then you could use three UpdateAttribute processors to set the priority attribute to 1, 2, and 3, then use a funnel to converge them all.
You would set the PriorityAttributePrioritizer on the queue between the funnel and the next processor, and at this point any time a flow file with priority=1 hits the queue, it will always be processed before any flow files with priority=2 and priority=3.
Determining how to set the priority really depends on your data. It is usually based on something about the data, like a field from each flow file that is extracted to an attribute to tell it the priority, or just knowing that everything that comes from source #1 is higher priority than what comes from source #2. Setting randomly unique priorities doesn't really make sense because you don't even know what you are prioritizing on then.
回答3:
The PriorityAttributePrioritizer prioritizes flow files by looking for a flow file attribute named "priority" .I had file name appended with date ,so I added execute script and called groovy script to extract date from file name .Then these dates are sorted and flowfiles are iterated ,based on date sorting priority is incremented & added as flowfile attribute 'priority'.
Example : Fileone : priority 1 Filetwo : priority 2
Nififlow : Get file -> execute script (groovy-sort files,add priority attr)->change queue priority to PriorityAttributePrioritizer. Above configuration will process priority 1 file first and then further file processing will be done respectively.
来源:https://stackoverflow.com/questions/42528993/how-to-specify-priority-attributes-for-individual-flowfiles