data-warehouse

Should OLAP databases be denormalized for read performance?

你。 提交于 2019-11-27 09:57:58
I always thought that databases should be denormalized for read performance, as it is done for OLAP database design, and not exaggerated much further 3NF for OLTP design. PerformanceDBA in various posts, for ex., in Performance of different aproaches to time-based data defends the paradigm that database should be always well-designed by normalization to 5NF and 6NF (Normal Form). Have I understood it correctly (and what had I understood correctly)? What's wrong with the traditional denormalization approach/paradigm design of OLAP databases (below 3NF) and the advice that 3NF is enough for most

Data Warehouse vs. OLAP Cube?

时光总嘲笑我的痴心妄想 提交于 2019-11-27 09:44:19
问题 Can anyone explain what is really distinction between Data Warehouse and OLAP Cubes? Are they different approach for same thing? Is one of them deprecated in comparison with other? Are there any performance issues in one of them? Any explanation is welcomed 回答1: A data warehouse is a database with a design that makes analyzing data easier† (often with data from multiple sources). It is usually composed of fact tables and dimension tables, and often aggregate tables. OLAP is a set of

Efficiently storing 7.300.000.000 rows

倾然丶 夕夏残阳落幕 提交于 2019-11-27 09:37:28
问题 How would you tackle the following storage and retrieval problem? Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row: id (unique row identifier) entity_id (takes on values between 1 and 2.000.000 inclusive) date_id (incremented with one each day - will take on values between 1 and 3.650 (ten years: 1*365*10)) value_1 (takes on values between 1 and 1.000.000 inclusive) value_2 (takes on values between 1 and 1.000.000 inclusive) entity_id

How to extract data from Google Analytics and build a data warehouse (webhouse) from it?

与世无争的帅哥 提交于 2019-11-27 05:20:01
问题 I have click stream data such as referring URL, top landing pages, top exit pages and metrics such as page views, number of visits, bounces all in Google Analytics. There is no database yet where all this information might be stored. I am required to build a data warehouse from scratch(which I believe is known as web-house) from this data.So I need to extract data from Google Analytics and load it into a warehouse on a daily automated basis. My questions are:- 1)Is it possible? Every day data

Calendar table for Data Warehouse

你说的曾经没有我的故事 提交于 2019-11-26 19:06:35
For my data warehouse, I am creating a calendar table as follows: SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek Integer NOT NULL, NameOfDay VarChar (10) NOT NULL, NameOfMonth VarChar (10) NOT NULL, WeekOfYear Integer NOT NULL, JulianDay Integer NOT NULL, USAIsBankHoliday Bit NOT NULL, USADayName VarChar (100) NULL, ) ALTER TABLE dbo.Calendar ADD CONSTRAINT DF_Calendar_USAIsBankHoliday DEFAULT 0 FOR USAIsBankHoliday GO ALTER TABLE dbo.Calendar ADD CONSTRAINT DF_Calendar_USADayName DEFAULT '' FOR

What is Multi Dimension OLAP CUBE and give example cube with more than 3 dimensions

一个人想着一个人 提交于 2019-11-26 18:08:10
问题 As I am new to SSAS, have been reading an article on Multi-Dimension OLAP Cube and struggling to understand Cube concepts, It has been said that Although the term "cube" suggests three dimensions, a cube can have up to 64 dimensions. Could you please explain how is this possible on cube (other than 3-Dim example x,y,z planes)? Please don't give only links to study but also expecting some explanation. 回答1: Don't think of a cube as a three-dimensional structure (despite the name). A "dimension"

Should OLAP databases be denormalized for read performance?

谁说我不能喝 提交于 2019-11-26 17:53:17
问题 I always thought that databases should be denormalized for read performance, as it is done for OLAP database design, and not exaggerated much further 3NF for OLTP design. PerformanceDBA in various posts, for ex., in Performance of different aproaches to time-based data defends the paradigm that database should be always well-designed by normalization to 5NF and 6NF (Normal Form). Have I understood it correctly (and what had I understood correctly)? What's wrong with the traditional