partition

达梦数据库索引实践

删除回忆录丶 提交于 2019-12-12 09:07:38
达梦数据库索引实践 达梦数据库支持二级索引,聚集索引,唯一索引,函数索引,位图索引,分区索引等。 默认的表是索引组织表,利用rowid创建一个默认的索引,所以我们创建的索引,称为二级索引。建索引的目的是加快表的查询,对数据库做DML操作的时候,数据库会自动维护索引。索引是一棵倒置的树,使用索引,就是对这个索引树进行遍历。 建立索引的规则:经常查询的列、连接条件列、谓词经常出现的列(where)、查询是返回表的一小部分数据 不适合创建索引的情况:列上有大量的null、列上的数据有限(例如:性别) 1、查看索引信息 讲索引之前注意一下:创建索引,删除,重建索引和收集统计信息,不要在业务高峰去做。 查看某个用户下的索引情况 select owner,table_name,index_name,index_type from dba_indexes where owner='TEST1'; 首先创建一张来做下测试 create table TAB10 (id1 int, id2 int, id3 int, id4 int, id5 int, id6 int, id7 int, id8 int, name1 char(20), name2 varchar(30)); 查询发现,创建表的时候会默认自带创建一个聚集索引。 select owner,table_name,index_name

Error partitioning and formatting USB flash drive in C++

对着背影说爱祢 提交于 2019-12-12 02:38:58
问题 I'm stuck attempting to re-partition and format a USB flash drive using C++, any help would be great! The goal is to re-partition any arbitrary flash drive with a single partition taking the entire space and formatted FAT32 (later options NTFS and EXFAT). This will be done in batch, hopefully with 50+ devices at once, so drive letter access is not an option. I'm able to create a partition, but when I try IOCTL_DISK_SET_PARTITION_INFO_EX to set the format type, it is failing with 0x32, ERROR

how to specify the partition for mapPartition in spark

为君一笑 提交于 2019-12-12 02:24:46
问题 What I would like to do is compute each list separately so for example if I have 5 list ([1,2,3,4,5,6],[2,3,4,5,6],[3,4,5,6],[4,5,6],[5,6]) and I would like to get the 5 lists without the 6 I would do something like : data=[1,2,3,4,5,6]+[2,3,4,5,6,7]+[3,4,5,6,7,8]+[4,5,6,7,8,9]+[5,6,7,8,9,10] def function_1(iter_listoflist): final_iterator=[] for sublist in iter_listoflist: final_iterator.append([x for x in sublist if x!=6]) return iter(final_iterator) sc.parallelize(data,5).glom()

Update with group by

一世执手 提交于 2019-12-11 20:19:50
问题 I'm stumped on what seemed to be a simple UPDATE statement. I'm looking for an UPDATE that uses two values. The first (a) is used to group, the second (b) is used to find a local minimum of values within the respective group. As a little extra there is a threshold value on b: Any value 1 or smaller shall remain as it is. drop table t1; create table t1 (a number, b number); insert into t1 values (1,0); insert into t1 values (1,1); insert into t1 values (2,1); insert into t1 values (2,2);

Drop multiple partitions based on date

泪湿孤枕 提交于 2019-12-11 17:56:35
问题 I have a table based on daily partitions. I can drop a paritition using the below query ALTER TABLE MY_TABLE DROP PARTITION FOR(TO_DATE('19-DEC-2017','dd-MON-yyyy')) How can I drop all the partitions(multiple partitions) before 15days? 回答1: You can use PL/SQL like this. DECLARE CANNOT_DROP_LAST_PARTITION EXCEPTION; PRAGMA EXCEPTION_INIT(CANNOT_DROP_LAST_PARTITION, -14758); ts TIMESTAMP; BEGIN FOR aPart IN (SELECT PARTITION_NAME, HIGH_VALUE FROM USER_TAB_PARTITIONS WHERE TABLE_NAME = 'MY_TABLE

Subtraction for each complete month for real time database querying

一曲冷凌霜 提交于 2019-12-11 17:00:04
问题 I have a question, a bit different of Apply a substracton for each month About these SQL : ;WITH cte AS ( SELECT DISTINCT Annees = YEAR(DateTime), Mois = MONTH(DateTime), firstRecord = first_value(value) OVER (PARTITION BY YEAR(DateTime), MONTH(DateTime) ORDER BY DateTime ASC), lastRecord = first_value(value) OVER (PARTITION BY YEAR(DateTime), MONTH(DateTime) ORDER BY DateTime DESC) FROM AnalogHistory WHERE TagName = 'A_000000000000000000000000000058.PV_Kw' AND DateTime >= '01/01/2016 00:00

Kafka学习总结(一)——Kafka的message存储数据结构

不羁的心 提交于 2019-12-11 12:37:02
参考资料: https://blog.csdn.net/gongxinju/article/details/72672375 以后继续深入总结。 Kafka中的Message是以topic为基本单位组织的,不同的topic之间是相互独立的。每个topic又可以分成几个不同的partition(每个topic有几个partition是在创建topic时指定的),每个partition存储一部分Message。借用官方的一张图,可以直观地看到topic和partition的关系。 partition是以文件的形式存储在文件系统中,比如,创建了一个名为page_visits的topic,其有5个partition,那么在Kafka的数据目录中(由配置文件中的log.dirs指定的)中就有这样5个目录: page_visits-0, page_visits-1,page_visits-2,page_visits-3,page_visits-4,其命名规则为<topic_name>-<partition_id>,里面存储的分别就是这5个partition的数据。 接下来,本文将分析partition目录中的文件的存储格式和相关的代码所在的位置。 3.1、Partition的数据文件 Partition中的每条Message由offset来表示它在这个partition中的偏移量

Random subset containing at least one instance of each factor

旧时模样 提交于 2019-12-11 08:47:28
问题 Let's define a data.frame df with 3 columns and 10 rows. The third column is the class and the two first some variables. var1 <- rnorm(10) var2 <- rnorm(10,2) class<- as.factor(c(1,2,3,1,2,1,2,1,3,3)) df <- data.frame(var1=var1,var2=var2,class=class) How to randomly subset df in two subsets so that sub.df1 and sub.df2 have at least one instance of each class? 回答1: This works: set.seed(123) partition <- function(x, n = 2) sample(c(1:n, sample(1:n, length(x) - n, TRUE))) split(df, as.integer

Initial Token is cassandra is not working as expected

纵饮孤独 提交于 2019-12-11 07:32:40
问题 To understand the ring without vNodes, I tried initial token in Node 1 as 25 and Node 2 as 50 like below, Address Rack Status State Load Owns Token 50 172.30.56.60 rack1 Up Normal 82.08 KiB 100.00% 25 172.30.56.61 rack1 Up Normal 82.09 KiB 100.00% 50 I expect only the partition ranges between 0 to 50 should be added in database, But It is allowing any primary key / partition key value I provide as follows (user_id - primary / partition key). user_id | user_name | user_phone ------------+-----

震惊了!原来这才是kafka!

北城以北 提交于 2019-12-11 06:49:57
简介 kafka是一个分布式消息队列。具有高性能、持久化、多副本备份、横向扩展能力。生产者往队列里写消息,消费者从队列里取消息进行业务逻辑。一般在架构设计中起到解耦、削峰、异步处理的作用。 kafka对外使用topic的概念,生产者往topic里写消息,消费者从读消息。为了做到水平扩展,一个topic实际是由多个partition组成的,遇到瓶颈时,可以通过增加partition的数量来进行横向扩容。单个parition内是保证消息有序。 每新写一条消息,kafka就是在对应的文件append写,所以性能非常高。 kafka的总体数据流是这样的: kafka data flow 大概用法就是,Producers往Brokers里面的指定Topic中写消息,Consumers从Brokers里面拉去指定Topic的消息,然后进行业务处理。 图中有两个topic,topic 0有两个partition,topic 1有一个partition,三副本备份。可以看到consumer gourp 1中的consumer 2没有分到partition处理,这是有可能出现的,下面会讲到。 关于broker、topics、partitions的一些元信息用zk来存,监控和路由啥的也都会用到zk。 生产 基本流程是这样的: kafka sdk product flow.png 创建一条记录