partition | 易学教程

hive中的静态分区与动态分区

阅读更多关于 hive中的静态分区与动态分区

hive中创建分区表没有什么复杂的分区类型(范围分区、列表分区、hash分区、混合分区等)。分区列也不是表中的一个实际的字段，而是一个或者多个伪列。意思是说在表的数据文件中实际上并不保存分区列的信息与数据。下面的语句创建了一个简单的分区表： create table partition_test (member_id string, name string ) partitioned by ( stat_date string, province string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 这个例子中创建了stat_date和province两个字段作为分区列。通常情况下需要先预先创建好分区，然后才能使用该分区，例如： alter table partition_test add partition (stat_date='20110728',province='zhejiang'); 这样就创建好了一个分区。这时我们会看到hive在HDFS存储中创建了一个相应的文件夹： $ hadoop fs -ls /user/hive/warehouse/partition_test/stat_date=20110728 Found 1 items drwxr-xr-x - admin supergroup 0 2011

Kafka介绍

阅读更多关于 Kafka介绍

作用：系统之间解耦和峰值压力缓冲异步通信特点：生产者消费者模式，先进先出（FIFO）保证顺序，自己不丢数据，默认每隔7天清理数据，高吞吐量，没有主从关系，依靠Zk协调结构： topic：消息队列/分类　　kafka里面的消息是有topic来组织的，简单的我们可以想象为一个队列，一个队列就是一个topic，然后它把每个topic又分为很多个partition（自己指定），这个是为了做并行的，在每个partition内部消息强有序，相当于有序的队列，其中每个消息都有个序号offset，比如0到12，从前面读往后面写。一个partition对应一个broker，一个broker可以管多个partition ，比如说，topic有6个partition，有两个broker，那每个broker就管3个partition。这个partition可以很简单想象为一个文件，位置信息叫offset ,当数据发过来的时候它就往这个partition上面append，追加就行，消息不经过内存缓冲，直接写入磁盘（零拷贝技术），kafka和很多消息系统不一样，很多消息系统是消费完了我就把它删掉，而kafka是根据时间策略删除，而不是消费完就删除，在kafka里面没有一个消费完这么个概念，只有过期这样一个概念。　　producer自己决定往哪个partition里面去写

How to select rows from partition in MySQL

阅读更多关于 How to select rows from partition in MySQL

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I made partition my 300MB table and trying to make select query from p0 partition with this command SELECT * FROM employees PARTITION (p0); But I am getting following error ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(p0)' at line 1 How to write select query to get data from specific partition? 回答1: Depending on you MySql version, PARTITION keyword does not exist until MySQL 5.6.2 . You would be using MySQL 5.5 or even 5.1, but not

Sql Order by on multiple column

阅读更多关于 Sql Order by on multiple column

cassandra primary key column cannot be restricted

阅读更多关于 cassandra primary key column cannot be restricted

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am using Cassandra for the first time in a web app and I got a query problem. Here is my tab : CREATE TABLE vote ( doodle_id uuid, user_id uuid, schedule_id uuid, vote int, PRIMARY KEY ((doodle_id), user_id, schedule_id) ); On every request, I indicate my partition key, doodle_id. For example I can make without any problems : select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and user_id = 97a7378a-e1bb-4586-ada1-177016405142; But on the last request I made : select * from vote where doodle_id = c4778a27-f2ca-4c96

D3.js Zoomable Sunburst not Zooming

阅读更多关于 D3.js Zoomable Sunburst not Zooming

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I have a static 3 level Sunburst diagram - http://colinwhite.net/Sunburst/ My data is being nested with this function http://colinwhite.net/Sunburst/js/treeRemapper.js My approach is based on this example - http://bl.ocks.org/mbostock/4348373 For some reason my zoom and tweening is not working - var width = 960, height = 700, radius = Math.min(width, height) / 2, color = d3.scale.category20c(); var x = d3.scale.linear().range([0, 2 * Math.PI]), y = d3.scale.sqrt().range([0, radius]); var svg = d3.select("body").append("svg") .attr("width",

Kafka丢失数据的处理方法

阅读更多关于 Kafka丢失数据的处理方法

针对Kafka消息丢失的问题可以从Producer和Consumer两部分考虑。 1.Producer request.required.acks配置，使得Producer在将信息推送至Kafka后是否等待Kafka的响应信息。 0——代表Producer 不等待Kafka的响应信息就继续推送数据，会丢数据。（0, which means that the producer never waits for an acknowledgement from the broker. This option provides the lowest latency but the weakest durability guarantees (some data will be lost when a server fails)） 1——代表Producer 等待Partition Leader接收到数据的响应信息之后再继续推送数据，但是如果之后该Leader结点replica数据之前就挂了，仍然会丢数据。（1, which means that the producer gets an acknowledgement after the leader replica has received the data. This option provides better durability

Spark SQL saveAsTable is not compatible with Hive when partition is specified

阅读更多关于 Spark SQL saveAsTable is not compatible with Hive when partition is specified

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Kind of edge case, when saving parquet table in Spark SQL with partition, #schema definitioin final StructType schema = DataTypes.createStructType(Arrays.asList( DataTypes.createStructField("time", DataTypes.StringType, true), DataTypes.createStructField("accountId", DataTypes.StringType, true), ... DataFrame df = hiveContext.read().schema(schema).json(stringJavaRDD); df.coalesce(1) .write() .mode(SaveMode.Append) .format("parquet") .partitionBy("year") .saveAsTable("tblclick8partitioned"); Spark warns: Persisting partitioned data source

Error producing to embedded kafka queue after upgrade from 0.7 to 0.8.1.1

阅读更多关于 Error producing to embedded kafka queue after upgrade from 0.7 to 0.8.1.1

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I haven't been able to find anything that directly handled the problem I'm facing, so I'm posting here. I have JUnit/JBehave tests that spin up an embedded ZooKeeper server, embedded Kafka server, and kafka producers and consumers. After upgrading kafka from 0.7 to 0.8.1.1, I'm encountering the following types of errors: ERROR [kafka-request-handler-5] state.change.logger - Error on broker 1 while processing LeaderAndIsr request correlationId 7 received from controller 1 epoch 1 for partition [topicName,8] java.lang.NullPointerException:

How createCombiner,mergeValue, mergeCombiner works in CombineByKey in Spark ( Using Scala)

阅读更多关于 How createCombiner,mergeValue, mergeCombiner works in CombineByKey in Spark ( Using Scala)

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 由翻译强力驱动问题: I am trying to understand how each step in combineByKeys works. Can someone please help me understand the same for the below RDD? val rdd = sc . parallelize ( List ( ( "A" , 3 ), ( "A" , 9 ), ( "A" , 12 ), ( "A" , 0 ), ( "A" , 5 ),( "B" , 4 ), ( "B" , 10 ), ( "B" , 11 ), ( "B" , 20 ), ( "B" , 25 ),( "C" , 32 ), ( "C" , 91 ), ( "C" , 122 ), ( "C" , 3 ), ( "C" , 55 )), 2 ) rdd . combineByKey ( ( x : Int ) => ( x , 1 ), ( acc :( Int , Int ), x ) => ( acc . _1 + x , acc . _2 + 1 ), ( acc1 :( Int , Int ), acc2 :( Int , Int )) => ( acc1

订阅 partition