hawq

Merge Operation Fails -gpload utility greenplum

守給你的承諾、 提交于 2019-12-12 03:44:57
问题 We would like try to describe my problem below: We have small gpdb cluster. In that,we are trying for Data integration using Talend tool. We are trying to load the incremental from a table to another table, quite simple... I thought... Job Data Flow is tgreenplumconnection | tmssqlinput--->thdfsoutput-->tmap-->tgreenplumgpload--tgreenplumcommit Getting error Exception in thread "Thread-1" java.lang.RuntimeException: Cannot run program "gpload": CreateProcess error=2, The system cannot find

Greenplum, Pivotal HD + Spark, or HAWQ for TBs of Structured Data?

北战南征 提交于 2019-12-06 06:50:23
问题 I have TBs of structured data in a Greenplum DB. I need to run what is essentially a MapReduce job on my data. I found myself reimplementing at least the features of MapReduce just so that this data would fit in memory (in a streaming fashion). Then I decided to look elsewhere for a more complete solution. I looked at Pivotal HD + Spark because I am using Scala and Spark benchmarks are a wow-factor. But I believe the datastore behind this, HDFS, is going to be less efficient than Greenplum.

Greenplum, Pivotal HD + Spark, or HAWQ for TBs of Structured Data?

狂风中的少年 提交于 2019-12-04 13:35:35
I have TBs of structured data in a Greenplum DB. I need to run what is essentially a MapReduce job on my data. I found myself reimplementing at least the features of MapReduce just so that this data would fit in memory (in a streaming fashion). Then I decided to look elsewhere for a more complete solution. I looked at Pivotal HD + Spark because I am using Scala and Spark benchmarks are a wow-factor. But I believe the datastore behind this, HDFS, is going to be less efficient than Greenplum. (NOTE the "I believe". I would be happy to know I am wrong but please give some evidence.) So to keep