greenplum | 易学教程

Greenplum, Pivotal HD + Spark, or HAWQ for TBs of Structured Data?

阅读更多关于 Greenplum, Pivotal HD + Spark, or HAWQ for TBs of Structured Data?

I have TBs of structured data in a Greenplum DB. I need to run what is essentially a MapReduce job on my data. I found myself reimplementing at least the features of MapReduce just so that this data would fit in memory (in a streaming fashion). Then I decided to look elsewhere for a more complete solution. I looked at Pivotal HD + Spark because I am using Scala and Spark benchmarks are a wow-factor. But I believe the datastore behind this, HDFS, is going to be less efficient than Greenplum. (NOTE the "I believe". I would be happy to know I am wrong but please give some evidence.) So to keep

DISTRIBUTE BY notices in Greenplum

阅读更多关于 DISTRIBUTE BY notices in Greenplum

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Say I run the following query on psql: > select a.c1, b.c2 into temp_table from db.A as a inner join db.B as b > on a.x = b.x limit 10; I get the following message: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'c1' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. What is a DISTRIBUTED BY column? Where is temp_table stored? Is it stored on my

greenplum hang forever when doing any search or insert actions with psql and centos7

阅读更多关于 greenplum hang forever when doing any search or insert actions with psql and centos7

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: greenplum version is 5.3.0 centos 7 As title, The following is result of gplogfilter SELECT pg_catalog.quote_ident(n.nspname) || '.' FROM pg_catalog.pg_namespace n WHERE substring(pg_catalog.quote_ident(n.nspname) || '.',1,7)='test_vb' AND (SELECT pg_catalog.count(*) FROM pg_catalog.pg_namespace WHERE substring(pg_catalog.quote_ident(nspname) || '.',1,7) = substring('test_vb',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) > 1 UNION SELECT pg_catalog.quote_ident(n.nspname) || '.' || pg_catalog.quote_ident(c.relname) FROM pg_catalog

How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum

阅读更多关于 How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: when I execute the following sql (which is contained by a sql file generated by pg_dump of Postgres9.4) in greenplum: CREATE TABLE "public"."trm_concept" ( "pid" int8 NOT NULL, "code" varchar(100) NOT NULL, "codesystem_pid" int8, "display" varchar(400) , "index_status" int8, CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"), CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code") ); I got this error: ERROR: Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints why greenplum doesn't allow this? I really

MPP - GreenPlum数据库安装以及简单使用

阅读更多关于 MPP - GreenPlum数据库安装以及简单使用

一、集群介绍架构图入下二、服务器修改(all host) 2.1配置hosts 　　vi /etc/hosts 192.168.0.93 gpdb-1 mdw 192.168.0.94 gpdb-2 sdw1 192.168.0.95 gpdb-3 sdw2 2.2创建用户及用户组 2.2.1创建用户组，组id为530 groupadd -g 530 gpadmin 2.2.2创建用户，赋予gpadmin用户组，并自定用户根目录 useradd -g 530 -u 530 -d /home/gpadmin -s /bin/bash gpadmi 2.2.3授权/home/gpadmin chown -R gpadmin:gpadmin /home/gpadmin 2.2.4修改密码 passwd gpadmin 2.3关闭防火墙 2.3.1关闭默认防火墙 systemctl stop firewalld 2.3.2关闭iptables systemctl stop iptables 2.4修改network文件　　vi /etc/sysconfig/network NETWORKING = yes HOSTNAME =对应的主机名称 2.5修改系统文件 2.5.1修改内核配置　　vi /etc/sysctl.conf kernel.shmmax = 5000000000

greenplum安装遇Failed Update port number to 40000错误

阅读更多关于 greenplum安装遇Failed Update port number to 40000错误

在安装greenplum过程中，遇到Failed Update port number to 40000错误信息： os： centos6.5 gp version:４.３.８初始化时日志中遇到如下问题： 20180605:11:37:53:010114 gpcreateseg.sh:gp-s0011:gpadmin-[FATAL][3]:-Failed Update port number to 40000 解决方法： yum -y install ed. 文章来源: greenplum安装遇Failed Update port number to 40000错误

Greenplum中装载和卸载数据

阅读更多关于 Greenplum中装载和卸载数据

装载和卸载数据 GP装载概述关于外部表 WEB：访问动态数据源（比如wen服务或者OS的命令或脚本）关于gpload 2) 需要创建一个按照YAML格式定义的装载说明控制文件关于copy 2) 不具有并行装载/卸载的机制定义外部表概述在创建外部表定义时，必须指定文件格式和文件位置；三种用来访问外部表数据源的协议：gpfdist, gpfdists和gphdfs。 gpfdist 5) 可使用通配符或者C风格的模式匹配多个文件 gpfdists 1) gpfdists是gpfdist的安全版本，其开启的加密通信并确保文件与GP之间的安全认证 file 4) pg_max_external_files用来确定每个外部表中允许多少个外部文件 gphdfs 4) 对于写来说，每个GP Segment实例值写该实例包含的数据外部文件格式 3) 自定义格式适用于gphdfs 外部表中的错误数据为了在装载正确格式的记录时隔离错误数据，需在定义外部表时使用单条记录出错处理外部表备份恢复在备份或者恢复操作中，仅仅外部表或者WEB外部表的定义会被备份或恢复。使用GP并行文件服务(gpfdist) b) 在后台启动gpfdist(日志信息和出错信息输出到日志文件) 1 c) 要在同一个ETL主机启动多个gpfdist服务，为每个服务指定不同的目录和端口。例如， 1 2

20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

阅读更多关于 20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

问题 I'd like to use your wisdom for picking up the right solution for a data-warehouse system. Here are some details to better understand the problem: Data is organized in a star schema structure with one BIG fact and ~15 dimensions. 20B fact rows per month 10 dimensions with hundred rows (somewhat hierarchy) 5 dimensions with thousands rows 2 dimensions with ~200K rows 2 big dimensions with 50M-100M rows Two typical queries run against this DB Top members in dimq: select top X dimq, count(id)

设置greenplum用户和密码访问：

阅读更多关于设置greenplum用户和密码访问：

设置greenplum用户和密码访问： 1、创建gp用户 create user tableau with nosuperuser nocreatedb password 'tableau' ; 2、赋表的读的权限 create table test( id integer ) GRANT select on table test to tableau; 3、设置配置文件： vim /extsdd1/gpadmin/data/master/gpseg-1/pg_hba.conf 增加下面两行： host all gpadmin 0.0.0.0/0 trust host all tableau 0.0.0.0/0 md5 来源：博客园作者： xmanman 链接：https://www.cnblogs.com/zhangwensi/p/11413146.html

Greenplum数据库集群

阅读更多关于 Greenplum数据库集群

Greenplum数据库集群首选操作系统 Red Hat Enterprise Linux (RHEL)是首选操作系统。应该使用最新的受支持的主版本，当前是RHEL 6。我使用的系统版本：centos7.6 文件系统 XFS是Greenplum数据库数据目录的最佳实践文件系统。XFS应该用下列选项挂载： rw,noatime,inode64 端口配置 ip_local_port_range 应该被设置为不与Greenplum数据库端口范围冲突。例如： net.ipv4.ip_local_port_range = 3000 65535 PORT_BASE=2000 MIRROR_PORT_BASE=2100 REPLICATION_PORT_BASE=2200 MIRROR_REPLICATION_PORT_BASE=2300

订阅 greenplum