greenplum | 易学教程

使用JDBC 连接greenplum

阅读更多关于使用JDBC 连接greenplum

　　这个其实非常简单，之所以要写此文是因为当前网上搜索到的文章都是使用PostgreSQL的驱动，没有找到使用greenplum官方驱动的案例，两者有什么区别呢？　　一开始我也使用的是PostgreSQL的驱动，但相同的sql，我在客户端软件中执行只要零点零几秒，但在代码中通过jdbc查询时需要一秒多，通过多次测试，发现这个延时基本稳定在一秒多，若在平时这个貌似也不是多大的事，但在pk性能是就至关重要了，本就是几秒的事，这延迟一秒多影响是相当的大了，后来在官网下载服务器时才看到greenplum有自己的驱动，下载下来使用后，发现也有延时，基本在零点四秒左右，虽然还是有延时，但还算是有不错的进步了。官方驱动还有个特点，就是想他语句连续执行5次左右，这个延时就会消失，速度就与用客户端查询速度一致了，暂时就这样了。　　官方驱动其他地方暂时没看到过，只能到官网下载，下载地址： https://network.pivotal.io/products/pivotal-gpdb#/releases/669/file_groups/178，里面还有关于驱动的使用文档。使用方式与使用PostgreSQL的驱动一样，只需要换一下驱动类和连接URL。官方驱动类（Data Source Class）： com.pivotal.jdbc.GreenplumDriver 官方驱动连接URL

Greenplum中内存设置不合理导致的报错

阅读更多关于 Greenplum中内存设置不合理导致的报错

声明：文中观点为作者的个人观点、不代表官方、如需更多帮助，请联系Pivotal官方·转载必须注明出处针对Greenplum中主要的内存设置参数做如下说明(不涉及OS级别参数)： statement_mem： ERROR: insufficient memory reserved for statement (memquota.c:228) 当扫描一张分区特别多的表时，会出现该错误，此时需要将默认的125MB的配置提高，建议在500MB左右或者更高一些。不过，如果需要做系统级别的修改需要谨慎对待，后面会结合几个参数说明。 gp_vmem_protect_limit： "ERROR","53200","Out of memory.Failed on request of size 156 bytes.(context 'CacheMemoryContext') (aset.c:840)" "ERROR","53400","Out of memory (seg13 slice13 sdw1-1:40001 pid=10183)","VM Protect failed to allocate 8388608 bytes, 6 MB available" 该错误是Greenplum系统无法从OS申请到所需要的内存导致的错误，因为gp_vmem_protect

greenplum--数据导入和导出

阅读更多关于 greenplum--数据导入和导出

insert 使用insert语句只适合加载少量的数据。 insert into tablename values ( val1 , val2 , . . . ) ; 或 insert into ( . . . ) select . . . from tabname copy copy命令可以将文件导入和导出，在gp中数据需要通过master节点，无法实现各个segment节点并行高效数据导入和导出。使用copy命令的语法如下： --\h command :可以获取命令的语法 postgres = # \h copy Command: COPY Description: copy data between a file and a table Syntax: --将文件数据导入表，数据是追加到表中的 COPY table [ ( column [ , . . . ] ) ] FROM { 'file' | STDIN} [ [ WITH ] [ OIDS ] [ HEADER ] [ DELIMITER [ AS ] 'delimiter' ] [ NULL [ AS ] 'null string' ] [ ESCAPE [ AS ] 'escape' | 'OFF' ] [ NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF' ] [ CSV [ QUOTE

Greenplum installation guide

阅读更多关于 Greenplum installation guide

Envireronment: VMware® Workstation 12 Pro 12.0.1 build-3160714(Host: Windows 7 Ultimate) Centos 6.5 x64 Greenplum 4.3.8.0 Resource: Greenplum 4.3.8.0(greenplum-db-4.3.8.0-build-1-RHEL5-x86_64.bin) 下载规划角色数量内存 CPU master 1 4GB 1*2 core master mirror 1 4GB 1*2 core segment(mirror) 3 4GB 1*2 core Envireronment setup：在VMware中安装5台虚拟机，操作系统为Centos 6.5。 mdw smdw sdw1 sdw2 sdw3 网络设置（每台）设置主机名称 vi /etc/sysconfig/network 1 NETWORKING=yes 2 HOSTNAME=mdw View Code 安装ifconfig（如果已经安装，可以跳过） yum install net-tools.x86_64 设置静态IP（防止重启后DHCP分配新的IP） vi /etc/sysconfig/network-scripts/ifcfg-eth0 1 DEVICE="eth0" 2

GreenPlum 配置参数设置

阅读更多关于 GreenPlum 配置参数设置

-------------------------------------------------MASTER SHOW ALL------------------------------------------------------ =# show all; name | setting | description ------------------------------------------------------+----------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------- add_missing_from | off | Automatically adds missing table references to FROM clauses. application_name | psql | Sets the

python连接Greenplum数据库

阅读更多关于 python连接Greenplum数据库

配置greenplum客户端认证配置pg_hba.conf cd /home/gpadmin/gpdbdata/master/gpseg-1 vim pg_hba.conf 增加 host all gpadmin 10.1.201.55/32 trust [gpadmin@ gpseg-1]$ export PGDATA=/home/gpadmin/gpdbdata/master/gpseg-1 [gpadmin@ gpseg-1]$ pg_ctl reload -D $PGDATA server signaled 使用Psycopg2访问数据库 Psycopg2 是 Python 语言下最常用的连接PostgreSQL数据库连接库，Psycopg2 的底层是由 C 语言封装 PostgreSQL 的标准库 libpq 实现的，运行速度非常快，Psycopg2支持大型多线程应用的大量并发Insert和Update操作，Psycopg2完全兼容 DB API 2.0　安装Psycopg2 pip install psycopg2 Psycopg2使用参考文档 http://initd.org/psycopg/docs/index.html Psycopg2 连接PostgreSQL数据库接口 Psycopg2提供的操作数据库的两个重要类是 Connection ， Cursor

Materialize Common Table Expression in Greenplum

阅读更多关于 Materialize Common Table Expression in Greenplum

问题 Is there a way to force Greenplum PostgreSQL to materialize a subquery in a WITH clause like what MATERIALIZE and INLINE optimizer hints do as below in Oracle? WITH dept_count AS ( SELECT /*+ MATERIALIZE */ deptno, COUNT(*) AS dept_count FROM emp GROUP BY deptno) SELECT ... I've been searching this for a while, only to find this functionality in Oracle. I know I can use CREATE TABLE AS , but I have several similar queries, forcing me to drop the temporary table after each query, which is very

Materialize Common Table Expression in Greenplum

阅读更多关于 Materialize Common Table Expression in Greenplum

PL/Python & postgreSQL: What is the best way to return a table of many columns?

阅读更多关于 PL/Python & postgreSQL: What is the best way to return a table of many columns?

问题 In Pl/Python "RETURNS setof" or "RETURNS table" clause are used to return a table like structured data. It seems to me that one has to provide the name of each column to get a table returned. If you have a table with a few columns it is an easy thing. However, if you have a table of 200 columns, what's the best way to do that? Do I have to type the names of all of columns (as shown below) or there is a way to get around it? Any help would be much appreciated. Below is an example that uses

Greenplum简介

阅读更多关于 Greenplum简介

Greenplum能做什么? 数仓 / OLAP / 即席查询混合负载 / HTAP 流数据集成数据分析数据库内嵌机器学习现代 SQL 核心架构架构图 Master Host: 主节点, 负责协调整个集群没有数据, 只有用户的元数据 Standby Master: 备份主 Segment Host: 每个Segment都是一个单节点的PostgreSQL数据库。包含用户的实际数据, 会等待master给它分配实际任务, 然后进行相互协调执行。每个Segment对应在另外一个节点上会有一个镜像(mirror), 当这台Segment挂了之后, 它的镜像就会自动提升为primary, 从而实现高可用。可以随着业务的扩充进行线性扩展每台机器都是独立的, 机器之间通过Interconnect进行网络通讯, 因为又被称为MPP无共享架构。数据分布多种分布策略: Hash, 随机, 复制表等最重要的策略和目标是均匀分布: 每个节点 1/n 数据多级分区多模存储 / 多态存储通常情况下, 数据价值随着时间越来越低 , 所以会有不同的对应处理模式。就比如说一张销售表: 最近3个月的数据, 我们可能要做的是对数据的完善及更新。距今3个月到1年的数据, 我们可能做的最多的是做一些查询, 聚集, 报表。 1年前+数据, 访问较少。对应存储模式: 1年前

订阅 greenplum