data-warehouse

What is a data warehouse?

故事扮演 提交于 2019-11-29 00:47:12
问题 I was asked by a customer what the term "data warehouse" really means. I thought about ETL, details of the data model, differences to NoSQL, Clouds, 'normal' DBMS, MDM (Master Data Management) etc. but wasn't able to describe the term in a few words to him... (In fact I did some talking and left him un-illuminated.) How can "data warehouse" described in 1-3 (or a bit more) sentences? 回答1: For non technical guys the best is to describe it as "Huge ammount of data stored in a specialized

Database design: one huge table or separate tables?

。_饼干妹妹 提交于 2019-11-28 22:54:14
Currently I am designing a database for use in our company. We are using SQL Server 2008. The database will hold data gathered from several customers. The goal of the database is to acquire aggregate benchmark numbers over several customers. Recently, I have become worried with the fact that one table in particular will be getting very big. Each customer has approximately 20.000.000 rows of data, and there will soon be 30 customers in the database (if not more). A lot of queries will be done on this table. I am already noticing performance issues and users being temporarily locked out. My

Star schema, normalized dimensions, denormalized hierarchy level keys

五迷三道 提交于 2019-11-28 21:32:59
Given the following star schema tables. fact, two dimensions, two measures. # geog_abb time_date amount value #1: AL 2013-03-26 55.57 9113.3898 #2: CO 2011-06-28 19.25 9846.6468 #3: MI 2012-05-15 94.87 4762.5398 #4: SC 2013-01-22 29.84 649.7681 #5: ND 2014-12-03 37.05 6419.0224 geography dimension, single hierarchy, 3 levels in hierarchy. # geog_abb geog_name geog_division_name geog_region_name #1: AK Alaska Pacific West #2: AL Alabama East South Central South #3: AR Arkansas West South Central South #4: AZ Arizona Mountain West #5: CA California Pacific West time dimension, two hierarchies, 4

Efficiently storing 7.300.000.000 rows

自作多情 提交于 2019-11-28 16:14:22
How would you tackle the following storage and retrieval problem? Roughly 2.000.000 rows will be added each day (365 days/year) with the following information per row: id (unique row identifier) entity_id (takes on values between 1 and 2.000.000 inclusive) date_id (incremented with one each day - will take on values between 1 and 3.650 (ten years: 1*365*10)) value_1 (takes on values between 1 and 1.000.000 inclusive) value_2 (takes on values between 1 and 1.000.000 inclusive) entity_id combined with date_id is unique. Hence, at most one row per entity and date can be added to the table. The

Warehouse: Store (and count) non-fact records?

时光怂恿深爱的人放手 提交于 2019-11-28 10:37:44
问题 How to store records that don't contain any fact? For example, let's say that a shop wants to count how many people have entered inside a store (and that they take info on every person that goes inside the shop). In warehouse, I guess there would be dimension table "Person" with different attributes, but how would fact table look like? Would it contain only foreign keys? 回答1: As you described it, that would be just a fact table. Actually, there is name for this -- factless fact table ; fact

Schema evolution in parquet format

倖福魔咒の 提交于 2019-11-28 06:49:57
Currently we are using Avro data format in production. Out of several good points using Avro, we know that it is good in schema evolution. Now we are evaluating Parquet format because of its efficiency while reading random columns. So before moving forward our concern is still schema evolution . Does anyone know if schema evolution is possible in parquet, if yes How is it possible, if no then Why not. Some resources claim that it is possible but it can only add columns at end . What does this mean? Schema evolution can be (very) expensive. In order to figure out schema, you basically have to

How to extract data from Google Analytics and build a data warehouse (webhouse) from it?

霸气de小男生 提交于 2019-11-28 04:24:42
I have click stream data such as referring URL, top landing pages, top exit pages and metrics such as page views, number of visits, bounces all in Google Analytics. There is no database yet where all this information might be stored. I am required to build a data warehouse from scratch(which I believe is known as web-house) from this data.So I need to extract data from Google Analytics and load it into a warehouse on a daily automated basis. My questions are:- 1)Is it possible? Every day data increases (some in terms of metrics or measures such as visits and some in terms of new referring

Is it good practice to have foreign keys in a datawarehouse (relationships)?

…衆ロ難τιáo~ 提交于 2019-11-27 17:48:54
问题 I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough. 回答1: I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-) While foreign key constraints help data integrity, they have an associated cost

Star-Schema Design [closed]

假如想象 提交于 2019-11-27 16:38:07
Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern? ConcernedOfTunbridgeWells Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for

What is Multi Dimension OLAP CUBE and give example cube with more than 3 dimensions

余生颓废 提交于 2019-11-27 11:57:34
As I am new to SSAS, have been reading an article on Multi-Dimension OLAP Cube and struggling to understand Cube concepts, It has been said that Although the term "cube" suggests three dimensions, a cube can have up to 64 dimensions. Could you please explain how is this possible on cube (other than 3-Dim example x,y,z planes)? Please don't give only links to study but also expecting some explanation. Don't think of a cube as a three-dimensional structure (despite the name). A "dimension" in a data warehouse situation is simply a varying value that you can use to access data in your warehouse.