partition | 易学教程

达梦数据库索引实践

阅读更多关于达梦数据库索引实践

达梦数据库索引实践达梦数据库支持二级索引，聚集索引，唯一索引，函数索引，位图索引，分区索引等。默认的表是索引组织表，利用rowid创建一个默认的索引，所以我们创建的索引，称为二级索引。建索引的目的是加快表的查询，对数据库做DML操作的时候，数据库会自动维护索引。索引是一棵倒置的树，使用索引，就是对这个索引树进行遍历。建立索引的规则：经常查询的列、连接条件列、谓词经常出现的列（where）、查询是返回表的一小部分数据不适合创建索引的情况：列上有大量的null、列上的数据有限（例如：性别） 1、查看索引信息讲索引之前注意一下：创建索引，删除，重建索引和收集统计信息，不要在业务高峰去做。查看某个用户下的索引情况 select owner,table_name,index_name,index_type from dba_indexes where owner='TEST1'; 首先创建一张来做下测试 create table TAB10 (id1 int, id2 int, id3 int, id4 int, id5 int, id6 int, id7 int, id8 int, name1 char(20), name2 varchar(30)); 查询发现，创建表的时候会默认自带创建一个聚集索引。 select owner,table_name,index_name

Error partitioning and formatting USB flash drive in C++

阅读更多关于 Error partitioning and formatting USB flash drive in C++

问题 I'm stuck attempting to re-partition and format a USB flash drive using C++, any help would be great! The goal is to re-partition any arbitrary flash drive with a single partition taking the entire space and formatted FAT32 (later options NTFS and EXFAT). This will be done in batch, hopefully with 50+ devices at once, so drive letter access is not an option. I'm able to create a partition, but when I try IOCTL_DISK_SET_PARTITION_INFO_EX to set the format type, it is failing with 0x32, ERROR

how to specify the partition for mapPartition in spark

阅读更多关于 how to specify the partition for mapPartition in spark

问题 What I would like to do is compute each list separately so for example if I have 5 list ([1,2,3,4,5,6],[2,3,4,5,6],[3,4,5,6],[4,5,6],[5,6]) and I would like to get the 5 lists without the 6 I would do something like : data=[1,2,3,4,5,6]+[2,3,4,5,6,7]+[3,4,5,6,7,8]+[4,5,6,7,8,9]+[5,6,7,8,9,10] def function_1(iter_listoflist): final_iterator=[] for sublist in iter_listoflist: final_iterator.append([x for x in sublist if x!=6]) return iter(final_iterator) sc.parallelize(data,5).glom()

Update with group by

阅读更多关于 Update with group by

问题 I'm stumped on what seemed to be a simple UPDATE statement. I'm looking for an UPDATE that uses two values. The first (a) is used to group, the second (b) is used to find a local minimum of values within the respective group. As a little extra there is a threshold value on b: Any value 1 or smaller shall remain as it is. drop table t1; create table t1 (a number, b number); insert into t1 values (1,0); insert into t1 values (1,1); insert into t1 values (2,1); insert into t1 values (2,2);

Drop multiple partitions based on date

阅读更多关于 Drop multiple partitions based on date

问题 I have a table based on daily partitions. I can drop a paritition using the below query ALTER TABLE MY_TABLE DROP PARTITION FOR(TO_DATE('19-DEC-2017','dd-MON-yyyy')) How can I drop all the partitions(multiple partitions) before 15days? 回答1: You can use PL/SQL like this. DECLARE CANNOT_DROP_LAST_PARTITION EXCEPTION; PRAGMA EXCEPTION_INIT(CANNOT_DROP_LAST_PARTITION, -14758); ts TIMESTAMP; BEGIN FOR aPart IN (SELECT PARTITION_NAME, HIGH_VALUE FROM USER_TAB_PARTITIONS WHERE TABLE_NAME = 'MY_TABLE

Subtraction for each complete month for real time database querying

阅读更多关于 Subtraction for each complete month for real time database querying

问题 I have a question, a bit different of Apply a substracton for each month About these SQL : ;WITH cte AS ( SELECT DISTINCT Annees = YEAR(DateTime), Mois = MONTH(DateTime), firstRecord = first_value(value) OVER (PARTITION BY YEAR(DateTime), MONTH(DateTime) ORDER BY DateTime ASC), lastRecord = first_value(value) OVER (PARTITION BY YEAR(DateTime), MONTH(DateTime) ORDER BY DateTime DESC) FROM AnalogHistory WHERE TagName = 'A_000000000000000000000000000058.PV_Kw' AND DateTime >= '01/01/2016 00:00

Kafka学习总结(一)——Kafka的message存储数据结构

阅读更多关于 Kafka学习总结(一)——Kafka的message存储数据结构

参考资料： https://blog.csdn.net/gongxinju/article/details/72672375 以后继续深入总结。 Kafka中的Message是以topic为基本单位组织的，不同的topic之间是相互独立的。每个topic又可以分成几个不同的partition(每个topic有几个partition是在创建topic时指定的)，每个partition存储一部分Message。借用官方的一张图，可以直观地看到topic和partition的关系。 partition是以文件的形式存储在文件系统中，比如，创建了一个名为page_visits的topic，其有5个partition，那么在Kafka的数据目录中(由配置文件中的log.dirs指定的)中就有这样5个目录: page_visits-0， page_visits-1，page_visits-2，page_visits-3，page_visits-4，其命名规则为<topic_name>-<partition_id>，里面存储的分别就是这5个partition的数据。接下来，本文将分析partition目录中的文件的存储格式和相关的代码所在的位置。 3.1、Partition的数据文件 Partition中的每条Message由offset来表示它在这个partition中的偏移量

Random subset containing at least one instance of each factor

阅读更多关于 Random subset containing at least one instance of each factor

问题 Let's define a data.frame df with 3 columns and 10 rows. The third column is the class and the two first some variables. var1 <- rnorm(10) var2 <- rnorm(10,2) class<- as.factor(c(1,2,3,1,2,1,2,1,3,3)) df <- data.frame(var1=var1,var2=var2,class=class) How to randomly subset df in two subsets so that sub.df1 and sub.df2 have at least one instance of each class? 回答1: This works: set.seed(123) partition <- function(x, n = 2) sample(c(1:n, sample(1:n, length(x) - n, TRUE))) split(df, as.integer

Initial Token is cassandra is not working as expected

阅读更多关于 Initial Token is cassandra is not working as expected

问题 To understand the ring without vNodes, I tried initial token in Node 1 as 25 and Node 2 as 50 like below, Address Rack Status State Load Owns Token 50 172.30.56.60 rack1 Up Normal 82.08 KiB 100.00% 25 172.30.56.61 rack1 Up Normal 82.09 KiB 100.00% 50 I expect only the partition ranges between 0 to 50 should be added in database, But It is allowing any primary key / partition key value I provide as follows (user_id - primary / partition key). user_id | user_name | user_phone ------------+-----

震惊了！原来这才是kafka！

阅读更多关于震惊了！原来这才是kafka！

简介 kafka是一个分布式消息队列。具有高性能、持久化、多副本备份、横向扩展能力。生产者往队列里写消息，消费者从队列里取消息进行业务逻辑。一般在架构设计中起到解耦、削峰、异步处理的作用。 kafka对外使用topic的概念，生产者往topic里写消息，消费者从读消息。为了做到水平扩展，一个topic实际是由多个partition组成的，遇到瓶颈时，可以通过增加partition的数量来进行横向扩容。单个parition内是保证消息有序。每新写一条消息，kafka就是在对应的文件append写，所以性能非常高。 kafka的总体数据流是这样的： kafka data flow 大概用法就是，Producers往Brokers里面的指定Topic中写消息，Consumers从Brokers里面拉去指定Topic的消息，然后进行业务处理。图中有两个topic，topic 0有两个partition，topic 1有一个partition，三副本备份。可以看到consumer gourp 1中的consumer 2没有分到partition处理，这是有可能出现的，下面会讲到。关于broker、topics、partitions的一些元信息用zk来存，监控和路由啥的也都会用到zk。生产基本流程是这样的： kafka sdk product flow.png 创建一条记录

订阅 partition