data-warehouse | 易学教程

Nulls in dimension table for numeric attributes

阅读更多关于 Nulls in dimension table for numeric attributes

What is the best way to handle missing values in a dimension table? In the case of a textual column, it is easy to write "NA: Missing," but what should be done for numeric columns where it is important to retain the specific values . Note: I do not want a solution that uses banded values (e.g., textual columns for "0-50", "50-100", "NA: Missing"). For instance, a customer dimension may have a year-of-birth. How should missing years of birth be handled? Leave it null? Add in an arbitrary number as a placeholder such as 1900? Sometimes, it may be difficult to find a placeholder number. For

Does Azure SQL Data Warehouse have a way to split strings?

阅读更多关于 Does Azure SQL Data Warehouse have a way to split strings?

问题 Doing some research, I see that there are no good options to split strings in Azure SQL Data Warehouse. It doesn't have the new STRING_SPLIT() function or OPENJSON() function. It also doesn't allow SELECT statements in user defined functions to try and create your own like many of the custom splitter functions the community has made. Thus, I figured I would pose the questions: Does SQL Data Warehouse have ways to split strings and what are the best options to take here? Use Case You have a

Publishing data in a data warehouse

阅读更多关于 Publishing data in a data warehouse

Are there best practices or well known methods for publishing/announcing (via metadata etc) what data has been loaded, verified and is currently available for reporting in a data warehouse? I've seen several in-house systems for doing this - some pretty fragile. Are there some well-known concepts or good search terms I could look for? I'm not sure exactly what you're looking for here, but what exactly are the users waiting for? If it's for the system to be available again after a well-defined and consistent daily ETL process runs, then it's easy to send an email, re-enable your reporting

One or multiple fact tables?

阅读更多关于 One or multiple fact tables?

问题 I am trying to build a data mart. I have lot of dimensions, and couple of measures - facts. Every measure is connected to all dimensions in term of business. There is the standard approach that there will be one big fact table with all measures. But I have an idea: What If I have separate fact tables for each measure? What it will do with database performance, solution extensibility etc? EDIT::: there will be huge solution based on olap cubes in really complex corporate environment. So the

What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

阅读更多关于 What are the pros and cons of loading data directly into Google BigQuery vs going through Cloud Storage first?

Also, is there anything wrong with doing transforms/joins directly within BigQuery? I'd like to minimize the number of components and steps involved for a data warehouse I'm setting up (simple transaction and inventory data for a chain of retail stores.) Loading data via Cloud Storage is the fastest (and the cheapest) way. Loading directly can be done via app (using streaming insert which add some additional cost) For the doing transformation - if what are you plan/need to do can be done in BigQuery - you should do it in BigQuery :) - it is the best and fastest way of doing ETL. But you should

Labor Day Vs. Thanksgiving

阅读更多关于 Labor Day Vs. Thanksgiving

问题 I am creating a calendar table for my warehouse. I will use this as a foreign key for all the date fields. The code shown below creates the table and populates it. I was able to figure out how to find Memorial Day (last Monday of May) and Labor Day (first Monday of September). SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek Integer NOT NULL, NameOfDay VarChar (10) NOT NULL, NameOfMonth VarChar (10) NOT

SSAS Dimension attribute as Calculated Measure

阅读更多关于 SSAS Dimension attribute as Calculated Measure

I am having some issues trying to implement an average of a dimension attribute. The basic structure is: Booking Header Dimension Fact Table (multiple rows per Booking Header entry) On the booking header dimension I have a numerical attribute called Booking Window, and I want to be able to create a calculated measure that averages this value. We are using SQL Server 2012 standard edition. Any help would be greatly appreciated. The best approach would be to create a measure group from the dimension table (in BIDS, go to cube designer, tab "Cube Structure", right-click the cube object in the

How to model process and status history in a data warehouse?

阅读更多关于 How to model process and status history in a data warehouse?

Let's say that we have D_PROCESS , D_WORKER and D_STATUS as dimensions, and the fact F_EVENT that links a process (what) with a worker (who's in charge) and the "current" status. The process status changes over time. Shoud we store in F_EVENT one line per process/status/worker, or one line per process/worker, and "somewhere else" one line per status change for a given process/worker? I'm new to Datawarehouse and it's hard to find best practices/tutorial related to data modelization. Read The Data Warehouse Toolkit by Ralph Kimball for a good introduction to dimensional modeling. It sounds like

Time-based drilldowns in Power BI powered by Azure Data Warehouse

阅读更多关于 Time-based drilldowns in Power BI powered by Azure Data Warehouse

I have designed a simple Azure Data Warehouse where I want to track stock of my products on periodic basis. Moreover I want to have an ability to see that data grouped by month, weeks, days and hours with ability to drill down from top to bottom. I have defined 3 dimensions: DimDate DimTime DimProduct I have also defined a Fact table to track product stocks: FactStocks - DateKey (20160510, 20160511, etc) - TimeKey (0..23) - ProductKey (Product1, Product2) - StockValue (number, 1..9999) My fact sample data is below: 20160510 20 Product1 100 20160510 20 Product2 30 20160510 21 Product1 110

Datawarehouse Tutorial [closed]

阅读更多关于 Datawarehouse Tutorial [closed]

My boss has discovered a new magazine which mentioned data warehousing. Thus I am in search of a good tutorial or book on data warehousing. I will also accept recommendations on ways to stop my boss reading. There are two primary authors on data warehousing: Bill Inmon - who mostly writes about large enterprise data warehouses Ralph Kimball - who mostly writes about smaller, departmental data warehouses It's a good idea to get familiar with the ideas of both. Data warehousing is a mature and complex field, one that you're unlikely to be very successful with unless you've got a lot of