olap | 易学教程

How do you design an OLAP Database?

阅读更多关于 How do you design an OLAP Database?

I need a mental process to design an OLAP database... Essentially for standard relational it'd be (loosely): Identify Entities Identify Relationships Identify Properties of Entities For each property: Ensure property can be related to only one entity Ensure property is directly related to entity For OLAP databases, I understand the terminology, the motivation and the structure; however, I have no clue as to how to decompose my relational model into an OLAP model. Identify Dimensions (or By's) These are anything that you may want to analyse/group your report by. Every table in the source

What should I have in mind when building OLAP solution from scratch?

阅读更多关于 What should I have in mind when building OLAP solution from scratch?

I'm working for a company running a software product based on a MS SQL database server, and through the years I have developed 20-30 quite advanced reports in PHP, taking data directly from the database. This has been very successful, and people are happy with it. But it has some drawbacks: For new changes, it can be quite development intensive The user can't experiment much with the data - it is locked to a hard-coded view It can be slow for big reports I am considering gradually going to a OLAP-based approach, which can be queried from Excel or some web-based service. But I would like to do

What is the best approach to get from relational OLTP database to OLAP cube?

阅读更多关于 What is the best approach to get from relational OLTP database to OLAP cube?

I have a fairly standard OLTP normalised database and I have realised that I need to do some complex queries, averages, standard deviations across different dimensions in the data. So I have turned to SSAS and the creation of OLAP cubes. However to create the cubes I believe my data source structure needs to be in a 'star' or 'snowflake' configuration (which I don't think it is right now). Is the normal procedure to use SSIS to do some sort of ETL process on my primary OLTP DB into another relational DB that is in the proper 'star' configuration with facts and dimensions, and then use this DB

Benefits of using Staging Database while designing Data Warehouse

阅读更多关于 Benefits of using Staging Database while designing Data Warehouse

I am in process of designing a Data Warehouse Architecture. While exploring various options to Extract data from Production and putting into Data Warehouse, I came across many articles which mainly suggested following two approaches - Production DB ----> Data Warehouse (Star Schema) ----> OLAP Cube Production DB ----> Staging Database ----> Data Warehouse (Star Schema) ----> OLAP Cube I am still not sure which one is the better approach in terms of Performance and reducing processing load on Production database. Which approach you find better while designing Data Warehouse ? Below points are

Can OLAP be done in BigTable?

阅读更多关于 Can OLAP be done in BigTable?

In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.). The queries that you run on a table like this are usually of the form (meta-SQL): SELECT SUM(hits), SUM(bytes), FROM MyCube WHERE date='20090914' and pagename='Homepage' and

MDX - TopCount plus 'Other' or 'The Rest' by group (over a set of members)

阅读更多关于 MDX - TopCount plus 'Other' or 'The Rest' by group (over a set of members)

问题 I've got requirement to display top 5 customer sales by customer group, but with other customers sales within the group aggregated as 'Others'. Something similar to this question, but counted separately for each of customer groups. According to MSDN to perform TopCount, over a set of members you have to use Generate function. This part works ok: with set [Top5CustomerByGroup] AS GENERATE ( [Klient].[Grupa Klientow].[Grupa Klientow].ALLMEMBERS, TOPCOUNT ( [Klient].[Grupa Klientow]

SSAS Aggregation on Distinct ID

阅读更多关于 SSAS Aggregation on Distinct ID

I wish to change the default aggregation from SUM to SUM on Distinct ID Values. This is the current behaviour ID Amount 1 $10 1 $10 2 $20 3 $30 3 $30 Sum Total = $90 By default, I am getting a sum of $90. I wish to do the sum on distinct ids and get a value of $60. How would I modify the default Aggregation Behavior to achieve this result? Design your data as a many-to-many relationship: create one table/view having one record per ID and the amount column from the data shown in your question (the main fact table), and one table/view having one record per record of your data as shown in your

MDX - TopCount plus 'Other' or 'The Rest' by group (over a set of members)

阅读更多关于 MDX - TopCount plus 'Other' or 'The Rest' by group (over a set of members)

I've got requirement to display top 5 customer sales by customer group, but with other customers sales within the group aggregated as 'Others'. Something similar to this question , but counted separately for each of customer groups. According to MSDN to perform TopCount, over a set of members you have to use Generate function. This part works ok: with set [Top5CustomerByGroup] AS GENERATE ( [Klient].[Grupa Klientow].[Grupa Klientow].ALLMEMBERS, TOPCOUNT ( [Klient].[Grupa Klientow].CURRENTMEMBER * [Klient].[Klient].[Klient].MEMBERS , 5 , [Measures].[Przychody ze sprzedazy rzeczywiste wartosc] )

Intersection in MDX

阅读更多关于 Intersection in MDX

I recently ran into a problem in our SQL Server 2008 Analysis Services Cube. Imagine you have a simple sales data warehouse with orders and products. Each order can be associated with several products, and each product can be contained in several orders. So the data warehouse consists out of at least 3 tables: One for the Products, one for the Orders and one for the reference table, modelling the n:n relationship between both. The question I want our cube to answer is: How many orders are there which contain both product x and product y? In SQL, this is easy: select orderid from dbo

Data warehousing principles and NoSQL

阅读更多关于 Data warehousing principles and NoSQL

with MongoDB, CouchDB and related technologies we can get faster querying so is this still valid? “A copy of transaction data, specially restructured for queries and analyses.” (R. Kimball The Data Warehouse Toolkit, 1996 I mean, do we really need to restructure our data to an OLAP scheme to query it for analysis purposes? More specifically can drill-down, slice and dice and other reporting for analysis purposes be achieved with NoSQL (NOT necessarily with OLAP modelling)? Also could we overcome the "data subset" querying limitation of OLAP and report on the whole data universe with NoSQL? In