partition | 易学教程

MySQL 分表和表分区

阅读更多关于 MySQL 分表和表分区

为什么要分表和分区？日常开发中我们经常会遇到大表的情况，所谓的大表是指存储了百万级乃至千万级条记录的表。这样的表过于庞大，导致数据库在查询和插入的时候耗时太长，性能低下，如果涉及联合查询的情况，性能会更加糟糕。分表和表分区的目的就是减少数据库的负担，提高数据库的效率，通常点来讲就是提高表的增删改查效率。什么是分表？分表是将一个大表按照一定的规则分解成多张具有独立存储空间的实体表，我们可以称为子表，每个表都对应三个文件，MYD数据文件，.MYI索引文件，.frm表结构文件。这些子表可以分布在同一块磁盘上，也可以在不同的机器上。app读写的时候根据事先定义好的规则得到对应的子表名，然后去操作它。什么是分区？分区和分表相似，都是按照规则分解表。不同在于分表将大表分解为若干个独立的实体表，而分区是将数据分段划分在多个位置存放，可以是同一块磁盘也可以在不同的机器。分区后，表面上还是一张表，但数据散列到多个位置了。app读写的时候操作的还是大表名字，db自动去组织分区的数据。 mysql 分表和分区有什么联系呢？ 1.都能提高mysql的性高，在高并发状态下都有一个良好的表现。 2.分表和分区不矛盾，可以相互配合的，对于那些大访问量，并且表数据比较多的表，我们可以采取分表和分区结合的方式（如果merge这种分表方式，不能和分区配合的话，可以用其他的分表试），访问量不大

hive 学习笔记

阅读更多关于 hive 学习笔记

下文中的部分例子来源于《hive学习指南》和易百教程，但是总结大部分是自己写的。 hive　官方文档： https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableProperties 易百教程： https://www.yiibai.com/hive hive 学习笔记：一、基础 1. 数据类型：类型解释 tinyint 1 byte 整数 smallint 2 int 4 bigint 8 byte boolean boolean float 单精度浮点数 double 双进度浮点数 timestamp 整数、浮点数或字符串 binary 字节数组 string 字符串 decimal 集合：数据类型描述语法示例 struct C里的结构体，类似与class，可以用. 访问元素 struct('a', 'b', 'c')/ DDL: struct<street:string, city:string, zip:int> map k-v集合，可以用[key]访问元素 map('firt':'join', 'last':'kobe') / DDL : map<string, float> array 数组[a, b] ,可以i用 d[0

Kafka 的一些知识点整理【1】

阅读更多关于 Kafka 的一些知识点整理【1】

First: Kafka 是什么？ Kafka 是一个发布订阅系统最初是是LinkedIn 开发最后交给Apache 开源组织 github地址： https://github.com/apache/kafka 是用java 和Scala 去开发的~ Kafka 现在主要用于消息队列使用 Kafka 是一个快速可扩展内在就是分布式的系统分布式: Kafka 提供集群服务 Kafka cluster 可以由一个或者多个Broker 组成每个Broker 提供对客户端的服务分区：每一类消息或者叫订阅主体 topic 可以有很多分区 Partition 复制：一个topic 的分区有多个副本，按照一定的规则分布在broker集群中，副本可分为leader和follow,leader所在broker负责响应客户端的读写请求，follow周期性地同步leader数据，已防止leader故障后消息丢失常见的术语有哪些 Broker ： Kafka集群包含一个或多个服务器，这种服务器被称为broker。broker端不维护数据的消费状态，提升了性能。直接使用磁盘进行存储，线性读写，速度快：避免了数据在JVM内存和系统内存之间的复制，减少耗性能的创建对象和垃圾回收 Topic && Partition : Topic 是指消息发送的服务器的类别消费着用此类别去订阅消息

kafka的offset相关知识

阅读更多关于 kafka的offset相关知识

Offset存储模型由于一个partition只能固定的交给一个消费者组中的一个消费者消费，因此Kafka保存offset时并不直接为每个消费者保存，而是以 groupid-topic-partition -> offset 的方式保存。如图所示： Kafka在保存Offset的时候，实际上是将Consumer Group和partition对应的offset以消息的方式保存在__consumers_offsets这个topic中。 __consumers_offsets默认拥有50个partition，可以通过 Math.abs(groupId.hashCode() % offsets.topic.num.partitions) 的方式来查询某个Consumer Group的offset信息保存在__consumers_offsets的哪个partition中。下图展示了__consumers_offsets中保存的offset消息的格式：如图所示，一条offset消息的格式为groupid-topic-partition -> offset。因此consumer poll消息时，已知groupid和topic，又通过Coordinator分配partition的方式获得了对应的partition，自然能够通过Coordinator查找__consumers

kafka常用命令

阅读更多关于 kafka常用命令

kafka自带sh脚本使用示例：（1）启动/关闭kafka服务： ```shell nohup env JMX_PORT=9999 /path/to/kafka_2.10-0.8.2.2/bin/kafka-server-start.sh config/server.properties >/dev/null 2>&1 & /path/to/kafka_2.10-0.8.2.2/bin/zookeeper-server-stop.sh config/zookeeper.properties >/dev/null 2>&1 & ``` （2）创建topic /path/to/kafka_2.10-0.8.2.2/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test 查看topic列表 /path/to/kafka_2.10-0.8.2.2/bin/kafka-topics.sh --list --zookeeper localhost:2181 （3）发送msg /path/to/kafka_2.10-0.8.2.2/bin/kafka-console-producer.sh --broker-list localhost

Difference between partition and index in hive

阅读更多关于 Difference between partition and index in hive

I am new in hadoop and hive and I would know what is the difference between index and partition in hive? When I use index and when partition? Thank you! Indexes are new and evolving (features are being added) but currently Indexes are limited to single tables and cannot be used with external tables. Creating an index creates a separate table. Indexes can be partitioned (matching the partitions of the base table). Indexes are used to speed the search of data within tables. Partitions provide segregation of the data at the hdfs level, creating sub-directories for each partition. Partitioning

How to pick up all data into hive from subdirectories

阅读更多关于 How to pick up all data into hive from subdirectories

I have data organized in directories in a particular format (shown below) and want to add these to hive table. I want to add all data of 2012 directory. All below names are directory names, and the inner most dir (3rd level) has the actual data files. Is there any way to pick in the data directly without having to change this dir structure. Any pointers are appreciated. /2012/ | |---------2012-01 |---------2012-01-01 |---------2012-01-02 |... |... |---------2012-01-31 | |---------2012-02 |---------2012-02-01 |---------2012-02-02 |... |... |---------2012-02-28 | |---------2012-03 |... |... |---

Hive_分区表

阅读更多关于 Hive_分区表

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。分区表基本操作 1．引入分区表（需要根据日期对日志进行管理） /user/hive/warehouse/log_partition/20170702/20170702.log /user/hive/warehouse/log_partition/20170703/20170703.log /user/hive/warehouse/log_partition/20170704/20170704.log 2．创建分区表语法 hive (default)> create table dept ( deptno int, dname string, loc string ) partitioned by (month string) row format delimited fields terminated by '\t'; 注意：分区字段不能是表中已经存在的数据，可以将分区字段看作表的伪列。 3．加载数据到分区表中 hive (default)> load data local inpath '/opt/module/datas

Hive 教程(七)-DML基础

阅读更多关于 Hive 教程(七)-DML基础

DML，Hive Data Manipulation Language，数据操作语言；通俗理解就是数据库里与数据的操作，如增删改查，统计汇总等； Loading files into tables 把文件数据写入 table，load 操作不对数据做任何转换 LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later) 比较好理解，这里只解释可选项： local：本地文件，如果上传本地文件，需注明 local，默认是 hdfs； overwrite：覆盖之前的数据，默认是追加； partition：分区表加载数据，这个参数指定 load 到哪个分区；示例 load data local inpath '/usr/lib/hive2.3.6/2.csv' into

Kafka + Zookeeper: Connection to node -1 could not be established. Broker may not be available

阅读更多关于 Kafka + Zookeeper: Connection to node -1 could not be established. Broker may not be available

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am running in my locahost both Zookeeper and Kafka (1 instance each). I create succesfully a topic from kafka: ./bin/kafka-topics.sh --zookeeper localhost:2181 --create --replication-factor 1 --partitions 1 --topic Hello-Nicola Created topic "Hello-Nicola". Kafka logs show: [2017-12-06 16:00:17,753] INFO [KafkaServer id=0] started (kafka.server.KafkaServer) [2017-12-06 16:03:19,347] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Hello-Nicola-0 (kafka.server.ReplicaFetcherManager) [2017-12-06 16:03:19,393] INFO

订阅 partition