hazelcast-jet deployment and data ingestion

岁酱吖の 提交于 2019-12-06 11:41:28
Marko Topolnik

What is the best way to deploy hazelcast-jet to multiple ec2 instances?

  1. Download and unzip the Hazelcast Jet distribution on each machine:

    $ wget https://download.hazelcast.com/jet/hazelcast-jet-3.1.zip
    $ unzip hazelcast-jet-3.1.zip
    $ cd hazelcast-jet-3.1
    
  2. Go to the lib directory of the unzipped distribution and download the hazelcast-aws module:

    $ cd lib
    $ wget https://repo1.maven.org/maven2/com/hazelcast/hazelcast-aws/2.4/hazelcast-aws-2.4.jar
    
  3. Edit bin/common.sh to add the module to the classpath. Towards the end of the file is a line

    CLASSPATH="$JET_HOME/lib/hazelcast-jet-3.1.jar:$CLASSPATH"
    

    You can duplicate this line and replace -jet-3.1 with -aws-2.4.

  4. Edit config/hazelcast.xml to enable the AWS cluster discovery. The details are here. In this step you'll have to deal with IAM roles, EC2 security groups, regions, etc. There's also a best practices guide for AWS deployment.

  5. Start the cluster with jet-start.sh.

How to config client so that it knows where to submit the tasks?

A straightforward approach is to specify the public IPs of the machines where Jet is running, for example:

ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("jet");
clientConfig.addAddress("54.224.63.209", "34.239.139.244");

However, depending on your AWS setup, these may not be stable, so you can configure to discover them as well. This is explained here.

How to ingest data from thousands of sources? The sources push data instead of being pulled.

I think your best option for this is to put the data into a Hazelcast Map, and use a mapJournal source to get the update events from it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!