partition

How to undo ALTER TABLE … ADD PARTITION without deleting data

时光怂恿深爱的人放手 提交于 2021-01-28 07:07:30
问题 Let's suppose I have two hive tables, table_1 and table_2 . I use: ALTER TABLE table_2 ADD PARTITION (col=val) LOCATION [table_1_location] Now, table_2 will have the data in table_1 at the partition where col = val . What I want to do is reverse this process. I want table_2 not to have the partition at col=val , and I want table_1 to keep its original data. How can I do this? 回答1: Make your table EXTERNAL first: ALTER TABLE table_2 SET TBLPROPERTIES('EXTERNAL'='TRUE'); Then drop partition,

Issue in Hive Query due to memory

主宰稳场 提交于 2021-01-28 07:01:38
问题 We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table. Query - insert into db1.fact_table PARTITION(part_col1, part_col2) ( col1, col2, col3, col4, col5, col6, . . . . . . . col32 LOAD_DT, part_col1, Part_col2 ) select col1, col2, col3, col4, col5, col6, . . . . . . . col32, part_col1, Part_col2 from db1.main_table WHERE col1=0; Table has 34 columns, number of records in main table depends on size of input file which we

How the LWT- Light Weight Transaction is working when we use IF NOT EXIST?

会有一股神秘感。 提交于 2021-01-28 03:19:39
问题 The question is that, When we use INSERT INTO USERS (login, email, name, login_count) values ('jbellis', 'jbellis@datastax.com', 'Jonathan Ellis', 1) IF NOT EXISTS in IF NOT EXIST exactly which columns are compared together? primary key(partition-key + clustering-key)? or just partition-key? 回答1: Here is a diagram of the 4 phases of LWT: http://www.slideshare.net/doanduyhai/cassandra-introduction-nantesjug/89 The original blog post is here: http://www.datastax.com/dev/blog/lightweight

All ways to partition a string

半腔热情 提交于 2021-01-17 11:11:20
问题 I'm trying to find a efficient algorithm to get all ways to partition a string eg for a given string 'abcd' => 'a' 'bcd' 'a' 'b' 'cd' 'a' 'b' 'c' 'd' 'ab' 'cd' 'ab' 'c' 'd' 'abc' 'd' 'a', 'bc', 'd any language would be appreciated Thanks in advance ! 回答1: Problem analysis Between each pair of adjacent characters, you can decide whether to cut. For a string of size n , there are n-1 positions where you can cut or not, i.e. there are two possibilities. Therefore there are 2^(n-1) partitions for

Hive external table optimal partition size

情到浓时终转凉″ 提交于 2020-12-26 03:22:50
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

Hive external table optimal partition size

纵饮孤独 提交于 2020-12-26 03:20:34
问题 What is the optimal size for external table partition? I am planning to partition table by year/month/day and we are getting about 2GB of data daily. 回答1: Optimal table partitioning is such that matching to your table usage scenario. Partitioning should be chosen based on: how the data is being queried (if you need to work mostly with daily data then partition by date). how the data is being loaded (parallel threads should load their own partitions, not overlapped) 2Gb is not too much even

function same as lag partition by in clickouse

China☆狼群 提交于 2020-12-15 06:21:08
问题 I need to know the frequency of order for each user. I mean difference between 2 order time for each user. In SQL I used "Lag Partition by" but I don't know how I can calculate this in click house. I need this data: at first I should sort data with user_id and created_at then I need to have next order time for each user id in row. I can't use neighbor function because it can't do partition by user_id. 回答1: I didn't understand why neighbor cannot be used in your case, but it should works well: