data-warehouse | 易学教程

Star vs Snowflake schema in data warehousing?

阅读更多关于 Star vs Snowflake schema in data warehousing?

问题 Currently, I've been involved in an warehouse based intelligent transaction analysis banking system featuring customer churn behavior, fraud detection & CRM analysis. We've been using Oracle as the database & it's completely a data warehousing project with data mining algorithms used for analysis. We have records of about 1000 customers of a bank. For modeling, whether it is better to use the star schema or snowflake schema or constellation schema? I know the basic difference of star and

How Do I aggregate Data By Day and Still Respect Timezone?

阅读更多关于 How Do I aggregate Data By Day and Still Respect Timezone?

We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down immensely. We have done all the tuning techniques recommended for PostgreSQL and we are still experiencing slowness. Our idea was to start aggregating by day rather than by hour, but the problem is that we allow our customers to change the timezone, which recalculates the data for that day. Does anyone know of a way to store the daily summary but still respect the numbers and totals when

Group by vs Partition by in Oracle

阅读更多关于 Group by vs Partition by in Oracle

I am writing a query to fetch records from a Oracle warehouse. Its a simple Select Query with joins on few tables and i have few columns to be aggregated. Hence i end up using Groupby on rest of the columns. Say I am picking some 10 columns and out of which 5 is aggregate columns. so i need to group by on the other 5 columns. I can even achieve the same by not doing a Groupby and using over (paritition by) clause on the each each aggregate column i want to derive. I am not sure which is better against a warehouse or in general. They are not the same. This will return 3 rows: select deptno,

Data warehouse for AD dates

阅读更多关于 Data warehouse for AD dates

We're creating a historic archive for a world history database and we need a date lookup table which references all dates in AD. How to go about creating the values for this table - from 1AD to 2011 as YYYY/MM/DD? Database is MySQL. Problems: I'm using Excel to pre-populate the dates, then import into MySQL as: YYYY/MM/DD but Excel doesn't recognize years like 0007, 0008, etc so I can't auto-copy cells to generate dates. I have to manually do it and this will take days to go from 1AD to year 2011 as YYYY/MM/DD. Leap years were introduced on 1752. If I programmatically generates dates how do I

What is best practice for representing time intervals in a data warehouse?

阅读更多关于 What is best practice for representing time intervals in a data warehouse?

问题 In particular I am dealing with a Type 2 Slowly Changing Dimension and need to represent the time interval a particular record was active for, i.e. for each record I have a StartDate and an EndDate . My question is around whether to use a closed ( [StartDate,EndDate] ) or half open ( [StartDate,EndDate) ) interval to represent this, i.e. whether to include the last date in the interval or not. To take a concrete example, say record 1 was active from day 1 to day 5 and from day 6 onwards

Time and date dimension in data warehouse

阅读更多关于 Time and date dimension in data warehouse

问题 I'm building a data warehouse. Each fact has it's timestamp . I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables. (source: etl-tools.info) But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL . What are your opinions/solutions ? (I'm using Infobright) 回答1: My guess

Where I can download sample database which can be used as data warehouse? [closed]

阅读更多关于 Where I can download sample database which can be used as data warehouse? [closed]

Where I can download sample database which can be used for data warehouse creation? It should't be sample from Microsoft (Northwind etc.). EDIT: Sorry for not clarifying my question. At my university we have class where we must create some data warehouse and since Northwind is so popular over net then professor told us not to use this database. We will use for this SQL Server 2008 but using Northwind is forbidden. Whatever happened to NOT Northwind? http://www.hanselman.com/blog/CommunityCallToActionNOTNorthwind.aspx There's also SQL Data Generator from Redgate: http://www.red-gate.com

What is the difference between a database and a data warehouse?

阅读更多关于 What is the difference between a database and a data warehouse?

问题 What is the difference between a database and a data warehouse? Aren't they the same thing, or at least written in the same thing (ie. Oracle RDBMS)? 回答1: Check out this for more information. From a previous link: Database Used for Online Transactional Processing (OLTP) but can be used for other purposes such as Data Warehousing. This records the data from the user for history. The tables and joins are complex since they are normalized (for RDMS). This is done to reduce redundant data and to

20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

阅读更多关于 20 Billion Rows/Month - Hbase / Hive / Greenplum / What?

问题 I'd like to use your wisdom for picking up the right solution for a data-warehouse system. Here are some details to better understand the problem: Data is organized in a star schema structure with one BIG fact and ~15 dimensions. 20B fact rows per month 10 dimensions with hundred rows (somewhat hierarchy) 5 dimensions with thousands rows 2 dimensions with ~200K rows 2 big dimensions with 50M-100M rows Two typical queries run against this DB Top members in dimq: select top X dimq, count(id)

Why primary key is (not) required on fact table in dimensional modelling?

阅读更多关于 Why primary key is (not) required on fact table in dimensional modelling?

I have heard a few references that pk is not required on fact table. I believe every single table should have a pk. How could a person understand a row in a fact table if there is no pk and 10+ foreign keys. Primary Key is there ... but Enforcing the primary key constraint in database level is not required. If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables