data-warehouse | 易学教程

How can one build the TFS cube from scratch?

阅读更多关于 How can one build the TFS cube from scratch?

问题 We are having issues with the TFS cube. I don't think it has been built since TFS was installed. The warehouse seems to be working and has new data it just seems to be the cube that doesn't work. We tried rebuilding it using the TFS Administrator Console but that made things worse, the data that was in there was erased and replaced by what looks like a blank Database. I tried deleting the Database so that I could see if the cube was actually being built but now when I run the rebuild it says

star schema design - one column dimensions

阅读更多关于 star schema design - one column dimensions

问题 I`m new to data warehousing, but I think my question can be relatively easy answered. I built a star schema, with a dimension table 'product'. This table has a column 'PropertyName' and a column 'PropertyValue'. The dimension therefore looks a little like this: surrogate_key | natural_key (productID) | PropertyName | PropertyValue | ... 1 5 Size 20 ... 2 5 Color red 3 6 Size 20 4 6 Material wood and so on. In my fact table I always use the surrogate keys of the dimensions. Cause of the

ETL Operation - Return Primary Key

阅读更多关于 ETL Operation - Return Primary Key

问题 I am using Talend to populate a data warehouse. My job is writing customer data to a dimension table and transaction data to the fact table. The surrogate key (p_key) on the fact table is auto-incrementing. When I insert a new customer, I need my fact table to reflect the id of the related customer. As I mentioned my p_key is auto auto_incrementing so I can't just insert an arbitrary value for the p_key. Any thought on how I can insert a row into my dimension table and still retrieve the

How to connect a fact and dimension table that are in 1-N relationship

阅读更多关于 How to connect a fact and dimension table that are in 1-N relationship

问题 I have a Purchase FactTable with some measures and dimension keys. Then, there's another another table: Discount Table. Purchase FactTable is in a 1-N relationship with Discount Table (for each purchase I might have bought several discounted items). Discount table has some attributes (description, note) and some numeric values (for example: discount in $) that I would like to roll-up. If I create a dimension out of this Discount Table, I'll get a wrong number of purchase counts in a sum count

Database design for incremental “export” to data warehouse

阅读更多关于 Database design for incremental “export” to data warehouse

问题 Given a 1 TB relational database, currently in SQL Server. The data warehouse needs a "copy" of major parts of the database. The warehouse data should not be more than 24 hours old. The size of the relational database makes it impractical to do a full load every night. How should I design my relational database to support incremental load to the warehouse? A very small portion (<0.1%) of the database changes in a single day, and it is mostly inserts. The intra-day changes are not required,

Nulls in dimension table for numeric attributes

阅读更多关于 Nulls in dimension table for numeric attributes

问题 What is the best way to handle missing values in a dimension table? In the case of a textual column, it is easy to write "NA: Missing," but what should be done for numeric columns where it is important to retain the specific values . Note: I do not want a solution that uses banded values (e.g., textual columns for "0-50", "50-100", "NA: Missing"). For instance, a customer dimension may have a year-of-birth. How should missing years of birth be handled? Leave it null? Add in an arbitrary

SSAS Dimension attribute as Calculated Measure

阅读更多关于 SSAS Dimension attribute as Calculated Measure

问题 I am having some issues trying to implement an average of a dimension attribute. The basic structure is: Booking Header Dimension Fact Table (multiple rows per Booking Header entry) On the booking header dimension I have a numerical attribute called Booking Window, and I want to be able to create a calculated measure that averages this value. We are using SQL Server 2012 standard edition. Any help would be greatly appreciated. 回答1: The best approach would be to create a measure group from the

Time-based drilldowns in Power BI powered by Azure Data Warehouse

阅读更多关于 Time-based drilldowns in Power BI powered by Azure Data Warehouse

问题 I have designed a simple Azure Data Warehouse where I want to track stock of my products on periodic basis. Moreover I want to have an ability to see that data grouped by month, weeks, days and hours with ability to drill down from top to bottom. I have defined 3 dimensions: DimDate DimTime DimProduct I have also defined a Fact table to track product stocks: FactStocks - DateKey (20160510, 20160511, etc) - TimeKey (0..23) - ProductKey (Product1, Product2) - StockValue (number, 1..9999) My

Strategies for populating a Reporting/Data Warehouse database

阅读更多关于 Strategies for populating a Reporting/Data Warehouse database

问题 For our reporting application, we have a process that aggregates several databases into a single 'reporting' database on a nightly basis. The schema of the reporting database is quite different than that of the separate 'production' databases that we are aggregating so there is a good amount of business logic that goes into how the data is aggregated. Right now this process is implemented by several stored procedures that run nightly. As we add more details to the reporting database the logic

Creating real time datawarehouse

阅读更多关于 Creating real time datawarehouse

问题 I am doing a personal project that consists of creating the full architecture of a data warehouse (DWH). In this case as an ETL and BI analysis tool I decided to use Pentaho; it has a lot of functionality from allowing easy dashboard creation, to full data mining processes and OLAP cubes. I have read that a data warehouse must be a relational database, and understand this. What I don't understand is how to achieve a near real time, or fully real time DWH. I have read about push and pull