Pig & Cassandra & DataStax Splits Control

亡梦爱人 提交于 2019-12-01 06:43:03

You should set pig.noSplitCombination = true. You can do this in one of three places.

When invoking the script:

dse pig -Dpig.noSplitCombination=true /path/to/script.pig

In the Pig script itself:

SET pig.noSplitCombination true
table = LOAD 'cfs://ks/cf' USING CqlStorage();

Or permanently in /etc/dse/pig/pig.properties. Uncomment:

pig.noSplitCombination=true

Otherwise, Pig may set your total input paths (combined) to process: 1.

You can set cassandra.input.split.size to something less than 64k which is the default split size, so you can get more splits. How many rows per node for the Cql table? Can you post your table schema?

add split_size to the url paramaters

For CassandraStorage use the following parameters cassandra://[username:password@]/[?slice_start=&slice_end=[&reversed=true][&limit=1][&allow_deletes=true][&widerows=true][&use_secondary=true][&comparator=][&split_size=][&partitioner=][&init_address=][&rpc_port=]]

For CqlStorage use the following parameters cql://[username:password@]/[?[page_size=][&columns=][&output_query=][&where_clause=][&split_size=][&partitioner=][&use_secondary=true|false][&init_address=][&rpc_port=]]

setting pig.noSplitCombination = true takes me to the other extreme end - with this flag I started having 769 map tasks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!