partitioning

Is it possible to create a kafka topic with dynamic partition count?

十年热恋 提交于 2019-12-03 08:44:34
问题 I am using kafka to stream the events of page visits by the website users to an analytics service. Each event will contain the following details for the consumer: user id IP address of the user I need very high throughput, so I decided to partition the topic with partition key as userId-ipAddress ie For a userId 1000 and ip address 10.0.0.1, the event will have partition key as "1000-10.0.0.1" In this use case the partition key is dynamic, so specifying the number of partitions upfront while

What is table partitioning?

人走茶凉 提交于 2019-12-03 08:22:06
问题 In which case we should use table partitioning? 回答1: Partitioning enables tables and indexes or index-organized tables to be subdivided into smaller manageable pieces and these each small piece is called a "partition". For more info: Partitioning in Oracle. What? Why? When? Who? Where? How? 回答2: An example may help. We collected data on a daily basis from a set of 124 grocery stores. Each days data was completely distinct from every other days. We partitioned the data on the date. This

Optimizing a Partition Function

只谈情不闲聊 提交于 2019-12-03 07:22:41
Here is the code, in python: # function for pentagonal numbers def pent (n): return int((0.5*n)*((3*n)-1)) # function for generalized pentagonal numbers def gen_pent (n): return pent(int(((-1)**(n+1))*(round((n+1)/2)))) # array for storing partitions - first ten already stored partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42] # function to generate partitions def partition (k): if (k < len(partitions)): return partitions[k] total, sign, i = 0, 1, 1 while (k - gen_pent(i)) >= 0: sign = (-1)**(int((i-1)/2)) total += sign*(partition(k - gen_pent(i))) i += 1 partitions.insert(k,total) return

Which part of the CAP theorem does Cassandra sacrifice and why?

左心房为你撑大大i 提交于 2019-12-03 07:08:36
问题 There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library. My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to manage as well? 回答1: Cassandra is typically classified as an AP system, meaning that availability and partition tolerance are generally considered to be more important than consistency. However, real world systems rarely fall neatly into these

How to Partition a Table by Month (“Both” YEAR & MONTH) and create monthly partitions automatically?

六眼飞鱼酱① 提交于 2019-12-03 06:18:50
I'm trying to Partition a Table by both Year and Month . The Column through which I'll partition is a datetime type column with an ISO Format ('20150110', 20150202', etc). For example, I have sales data for 2010, 2011, 2012. I'd Like the data to be partitioned by year and each year be partitioned by month as well. (2010/01, 2010/02, ... 2010/12, 2011/01, ... 2015/01...) E.X: Sales2010Jan, Sales2010Feb, Sales2011Jan, Sales2011Feb, Sales2012Dec, etc. My Question is: is it even possible? If it is, how an I automate the process using SSIS? Julien Vavasseur SSIS is an ETL (extract, transform, load)

What is the best way to partition large tables in SQL Server?

橙三吉。 提交于 2019-12-03 05:45:12
In a recent project the "lead" developer designed a database schema where "larger" tables would be split across two separate databases with a view on the main database which would union the two separate database-tables together. The main database is what the application was driven off of so these tables looked and felt like ordinary tables (except some quirky things around updating). This seemed like a HUGE performance problem. We do see problems with performance around these tables but nothing to make him change his mind about his design. Just wondering what is the best way to do this, or if

Is partitioning easier than sorting?

我的梦境 提交于 2019-12-03 04:46:20
This is a question that's been lingering in my mind for some time ... Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items. One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent. But can it be done more efficiently than with sorting? Is the time complexity of this problem lower than that of sorting? If not, why not?

Can MySQL create new partitions from the event scheduler

冷暖自知 提交于 2019-12-03 04:32:12
问题 I'm having a table looking something like this: CREATE TABLE `Calls` ( `calendar_id` int(11) NOT NULL, `db_date` timestamp NOT NULL, `cgn` varchar(32) DEFAULT NULL, `cpn` varchar(32) DEFAULT NULL, PRIMARY KEY (`calendar_id`), KEY `db_date_idx` (`db_date`) ) PARTITION BY RANGE (calendar_id)( PARTITION p20091024 VALUES LESS THAN (20091024) , PARTITION p20091025 VALUES LESS THAN (20091025)); Can I somehow use the mysql scheduler to automatically add a new partition(2 days in advance) - I'm

How to partition Azure tables used for storing logs

烂漫一生 提交于 2019-12-03 03:12:19
We have recently updated our logging to use Azure table storage, which owing to its low cost and high performance when querying by row and partition is highly suited to this purpose. We are trying to follow the guidelines given in the document Designing a Scalable Partitioning Strategy for Azure Table Storage . As we are making a great number of inserts to this table (and hopefully an increasing number, as we scale) we need to ensure that we don't hit our limits resulting in logs being lost. We structured our design as follows: We have a Azure storage account per environment (DEV, TEST, PROD).

Algorithm for finding nearby points?

匆匆过客 提交于 2019-12-03 02:46:30
问题 Given a set of several million points with x,y coordinates, what is the algorithm of choice for quickly finding the top 1000 nearest points from a location? "Quickly" here means about 100ms on a home computer. Brute force would mean doing millions of multiplications and then sorting them. While even a simple Python app could do that in less than a minute, it is still too long for an interactive application. The bounding box for the points will be known, so partitioning the space into a simple