Distribution of content among cluster nodes within edge NiFi processors

只愿长相守 提交于 2019-12-13 04:17:23

问题


I was exploring NiFi documentation. I must agree that it is one of the well documented open-source projects out there.

My understanding is that the processor runs on all nodes of the cluster. However, I was wondering about how the content is distributed among cluster nodes when we use content pulling processors like FetchS3Object, FetchHDFS etc. In processor like FetchHDFS or FetchSFTP, will all nodes make connection to the source? Does it split the content and fetch from multiple nodes or One node fetched the content and load balance it in the downstream queues?


回答1:


I think this document has an answer to your question:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

For other file stores the idea is the same.

will all nodes make connection to the source?

Yes. If you did not limit your processor to work only on primary node - it runs on all nodes.




回答2:


The answer by @dagget has traditionally been the approach to handle this situation, often referred to as the "list + fetch" pattern. List processor runs on Primary Node only, listings sent to RPG to re-distribute across the cluster, input port receives listings and connect to a fetch processor running on all nodes fetching in parallel.

In 1.8.0 there are now load balanced connections which remove the need for the RPG. You would still run the List processor on Primary Node only, but then connect it directly to the Fetch processors, and configure the queue in between to load balance.



来源:https://stackoverflow.com/questions/54568114/distribution-of-content-among-cluster-nodes-within-edge-nifi-processors

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!