data-warehouse

What is a data warehouse?

廉价感情. 提交于 2019-11-30 03:08:14
I was asked by a customer what the term "data warehouse" really means. I thought about ETL, details of the data model, differences to NoSQL, Clouds, 'normal' DBMS, MDM (Master Data Management) etc. but wasn't able to describe the term in a few words to him... (In fact I did some talking and left him un-illuminated.) How can "data warehouse" described in 1-3 (or a bit more) sentences? For non technical guys the best is to describe it as "Huge ammount of data stored in a specialized computer system. Data is usually related to some specific domain and whole system is designed to be fast and

Data Warehouse vs. OLAP Cube?

别来无恙 提交于 2019-11-29 20:25:14
Can anyone explain what is really distinction between Data Warehouse and OLAP Cubes? Are they different approach for same thing? Is one of them deprecated in comparison with other? Are there any performance issues in one of them? Any explanation is welcomed A data warehouse is a database with a design that makes analyzing data easier† (often with data from multiple sources). It is usually composed of fact tables and dimension tables, and often aggregate tables. OLAP is a set of operations that one can do on a data set, such as pivoting, slicing, dicing, drilling. For example, one can do OLAP

Database choice for large data volume?

你离开我真会死。 提交于 2019-11-29 19:43:51
I'm about to start a new project which should have a rather large database. The number of tables will not be large (<15), majority of data (99%) will be contained in one big table, which is almost insert/read only (no updates). The estimated amount of data in that one table is going to grow at 500.000 records a day , and we should keep at least 1 year of them to be able to do various reports. There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time. I don't have first hand experience with that large databases, so I'm asking the ones

Data warehouse for user data - design Q

喜欢而已 提交于 2019-11-29 15:38:32
问题 How to best store user data vs date/time dimension? Usecase is I am trying to store user actions per day, per hour. Such as number of Shares, likes, friends etc. I have a time table and a date table. For time it is easy - i have each row = user_id and colunms = 1 to 24 for each hour of the day. But problem is for dates. If i give each day = 1 colunm then i will have 365 colunms a year. I cannot archive the data way either because analytic needs past data too. What are the other strategies?

Warehouse: Store (and count) non-fact records?

冷暖自知 提交于 2019-11-29 15:37:51
How to store records that don't contain any fact? For example, let's say that a shop wants to count how many people have entered inside a store (and that they take info on every person that goes inside the shop). In warehouse, I guess there would be dimension table "Person" with different attributes, but how would fact table look like? Would it contain only foreign keys? As you described it, that would be just a fact table. Actually, there is name for this -- factless fact table ; fact table without any measures. It is quite common for recoding events. Essentially anything that records: who,

What is a staging table?

喜夏-厌秋 提交于 2019-11-29 11:49:13
问题 Are staging tables used only in Data warehouse project or in any SSIS Project? I would like to know what is a staging table? Can anyone give me some examples on how to use it and in what circumstances it is implemented? Also, may I please know the best practices while using it? 回答1: staging tables are just database tables containing your business data in some form or other. Staging is the process of preparing your business data, usually taken from some business application. For your average

What is the actual difference between Data Warehouse & Big Data?

a 夏天 提交于 2019-11-29 07:06:48
问题 I know what is Data Warehouse & what is Big Data. But I am confused with Data Warehouse Vs Big Data. Both are same with different names or both are different(Conceptually & Physically). 回答1: I know that this is an older thread but there have been some developments in the last year or so. Comparing the data warehouse to Hadoop is like comparing apples to oranges. The data warehouse is a concept: clean, integrated data of high quality. I don't think the need for a data warehouse will go away

Calendar tables in PostgreSQL 9

半腔热情 提交于 2019-11-29 05:57:27
I am building an analytics database (I have a firm understanding of the data and the business objectives and only basic-to-moderate database skills). I have come across some references to building similar warehouses which implement the concept of 'calendar tables'. This makes sense and is easily enough done. Most examples I see, however, are calendar tables that limit scope to 'day'. My data will need to be analyzed down to hour-level. Possibly minutes. My question: would an implementation of calendar tables for hour/minute-level granularity be of value in terms of space-efficiency and query

Is it good practice to have foreign keys in a datawarehouse (relationships)?

陌路散爱 提交于 2019-11-29 03:50:19
I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough. I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-) While foreign key constraints help data integrity, they have an associated cost on all insert, update and delete statements. Give careful attention to the use of constraints in your

Design of a data warehouse with more than one fact tables

左心房为你撑大大i 提交于 2019-11-29 02:24:15
问题 I'm new to data warehousing. First, I want to precise than my copy of The Data Warehouse Toolkit is on it's way to my mailbox (snail mail :P). But I'm already studying all this stuff with what I find on the net. What I don't find on the net, however, is what to do when you seems to have more than one fact in a DW. In my case (insurance), I have refunds that occur on a non regular basis. One client can have none for 3 months and then ten in the same months. On the other hands, I have