Event Notification of Data Availability in HDFS?

﹥>﹥吖頭↗ 提交于 2019-12-23 04:52:27

问题


What will be the best approach towards implementing a notification system for Hadoop for data availability such that whenever new data comes its creates a notification which can be utilized by job control framework to start their job which depends on that data. Here the main concern is as soon as the data becomes available the job should get triggered instead job polling on NameNode for availability of data?


回答1:


What I would do is use a producer/consumer model that can interact with each other using a queue like for example Amazon SQS.

The producer will maintain a list of watched directories, and do hadoop fs -test -e /path/to/watched/dir every x seconds (where x should be a parameter), and if the command returns 0 with $? then you can send a message to the queue. The content of the message could be just the name of the directory that just appeared, or you could add some metadata and send it as a JSON object for example with additional fields.

On the other side the consumer will listen to the queue every y seconds (where y should be a parameter), and as soon as there is new data you can start your job on this directory.



来源:https://stackoverflow.com/questions/14436748/event-notification-of-data-availability-in-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!