greenplum

function cannot execute on segment because it accesses relation

孤者浪人 提交于 2019-12-11 01:45:47
问题 I have a function defined as following in Greenplum postgres CREATE OR REPLACE FUNCTION vin_temp_func(j text) RETURNS integer AS $$ Declare varx integer; BEGIN select count(*) into varx from T_perf a left join T_profile b on a.sr_number = b.sr_number where b.product_name like '%V1%' and a.submit_date >= (('2013-02-01'::date - CAST(EXTRACT(DOW FROM '2013-02-01'::date) as int)) - 7)+'1 week'::interval and a.submit_date <= ('2013-02-01'::date - CAST(EXTRACT(DOW FROM '2013-02-01'::date)+1 as int)

How do we build Normalized table from DeNormalized text file one?

纵饮孤独 提交于 2019-12-10 12:06:22
问题 How do we build Normalized table from DeNormalized text file one? Thanks for your replies/time. We need to build a Normalized DB Table from DeNormalized text file. We explored couple of options such as unix shell , and PostgreSQL etc. I am looking learn better ideas for resolutions from this community. The input text file is various length with comma delimited records. The content may look like this: XXXXXXXXXX , YYYYYYYYYY, TTTTTTTTTTT, UUUUUUUUUU, RRRRRRRRR,JJJJJJJJJ 111111111111,

DISTRIBUTE BY notices in Greenplum

佐手、 提交于 2019-12-08 09:50:36
问题 Say I run the following query on psql: > select a.c1, b.c2 into temp_table from db.A as a inner join db.B as b > on a.x = b.x limit 10; I get the following message: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'c1' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. What is a DISTRIBUTED BY column?

How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum

萝らか妹 提交于 2019-12-08 03:28:19
问题 when I execute the following sql (which is contained by a sql file generated by pg_dump of Postgres9.4) in greenplum: CREATE TABLE "public"."trm_concept" ( "pid" int8 NOT NULL, "code" varchar(100) NOT NULL, "codesystem_pid" int8, "display" varchar(400) , "index_status" int8, CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"), CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code") ); I got this error: ERROR: Greenplum Database does not allow having both PRIMARY KEY and UNIQUE

How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum

删除回忆录丶 提交于 2019-12-06 16:49:07
when I execute the following sql (which is contained by a sql file generated by pg_dump of Postgres9.4) in greenplum: CREATE TABLE "public"."trm_concept" ( "pid" int8 NOT NULL, "code" varchar(100) NOT NULL, "codesystem_pid" int8, "display" varchar(400) , "index_status" int8, CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"), CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code") ); I got this error: ERROR: Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints why greenplum doesn't allow this? I really need this unique constraint to guarantee some rule,

第3篇:分布式数据库存储

只谈情不闲聊 提交于 2019-12-06 11:56:58
一、分布式数据库存储 在前面的章节;GreenPlum数据库是分布式架构数据库;表的数据分布在segment节点。那么表的数据根据什么策略来分布的? GreenPlum数据库性能依赖于跨数据节点均匀分布 GreenPlum数据库查询响应时间由所有数据节点完成时间来度量。系统只能跟最慢数据节点完成时间来决定。如果数据存储倾斜。一个数据节点比其他节点需要花更多的时间来处理数据,数据存储倾斜只会存在哈希分布的情况。 在GreenPlum数据库中;表关联查询最常见。若两个或者多个表关联的字段非分布键或者采用随机分布。在其他分布式架构这表之间的关系不是亲和表。要执行连接,匹配的行必须位于同一节点上。 如果数据未在同一连接列上分发,则其中一个表所需的行将动态重新分发到其他节点。 有些情况下,执行广播动作,每个节点将其各个行发送到所有其他节点上,而不是每个节点重新哈希数据并根据哈希值将行发送到适当的节点的重新分配。 是不是还有一种复制表?在GreenPlum6.0以上的版本支持复制表。正好避免2中的广播或者重分布动作。 二、分布策略 在GreenPlum数据库在创建表时可以指定分布策略:哈希分布( DISTRIBUTED BY )、随机分布( DISTRIBUTED RANDOMLY) 、复制分布 (DISTRIBUTED REPLICATED )。 哈希分布:需要指定分布键

Greenplum, Pivotal HD + Spark, or HAWQ for TBs of Structured Data?

北战南征 提交于 2019-12-06 06:50:23
问题 I have TBs of structured data in a Greenplum DB. I need to run what is essentially a MapReduce job on my data. I found myself reimplementing at least the features of MapReduce just so that this data would fit in memory (in a streaming fashion). Then I decided to look elsewhere for a more complete solution. I looked at Pivotal HD + Spark because I am using Scala and Spark benchmarks are a wow-factor. But I believe the datastore behind this, HDFS, is going to be less efficient than Greenplum.

第三个视频作品《小白快速入门greenplum》上线了

做~自己de王妃 提交于 2019-12-06 04:24:58
1.场景描述 第三个视频作品出炉了,《 小白快速入门greenplum 》上线了,有需要的朋友可以直接点击链接观看。(如需购买,请通过本文链接购买) 2. 课程内容 课程地址: https://edu.51cto.com/sd/2b7c8 课程目录: 目录 第一章 课程介绍 第二章 greenplum之背景介绍与下载 第三章 greenplum之系统架构与技术架构说明 第四章 greenplum之完整部署与说明 第五章 greenplum之greenplum-cc-web安装 第六章 greenplum之我对gp的理解 第七章 greenplum之高可用方案 第八章 greenplum之问题总结 第九章 课程总结 来源: https://www.cnblogs.com/ruanjianlaowang/p/11961687.html

Greenplum 常用数据库管理语句,sql工具

末鹿安然 提交于 2019-12-05 20:02:08
转载自:https://blog.csdn.net/you_xian/article/details/78549756 作者:lianghc 在greenplum 使用过程中积累的一些常用查询语句,整理出来备忘。欢迎各位留言补充。都是SQL命令以及数据字典的使用。熟悉数据字典非常重要。三个重要的schema:pg_catalog,pg_toolkit,information_schema,其中information_schema 中的数据字典都在视图中,并且这个schema中提供了大量的操作数据字典的函数值得研究。 一 数据库运行状态查询管理 1. greenplum查询正在运行的sql,session -- 方法1: SELECT tt.procpid, -- pid usename user_name, -- 执行的用户 backend_start, -- 会话开始时间 query_start, -- 查询开始时间 waiting, -- 是否等待执行 now() - query_start AS current_query_time, -- 累计执行时间 now() - backend_start AS current_session_time,*/ current_query, client_addr , datname FROM pg_stat_activity tt

rodbc character encoding error with PostgreSQL

空扰寡人 提交于 2019-12-05 05:51:30
I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past. library(RODBC) gp <- odbcConnect("greenplum", believeNRows = FALSE) data <- sqlQuery(gp, "select * from mytable") > data [1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in "WIN1252\";\nError while executing the query" [2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'" EDIT: Just tried querying another table and did get