data-warehouse

Handling nulls in Datawarehouse

雨燕双飞 提交于 2019-12-05 07:59:21
I'd like to ask your input on what the best practice is for handling null or empty data values when it pertains to data warehousing and SSIS/SSAS. I have several fact and dimension tables that contain null values in different rows. Specifics: 1) What is the best way to handle null date/times values? Should I make a 'default' row in my time or date dimensions and point SSIS to the default row when there is a null found? 2) What is the best way to handle nulls/empty values inside of dimension data. Ex: I have some rows in an 'Accounts' dimensions that have empty (not NULL) values in the Account

Adding/Combining Standard Deviations

杀马特。学长 韩版系。学妹 提交于 2019-12-05 06:04:08
问题 Short Version: Can StdDevs be added/combined? i.e. if StdDev(11,14,16,17)=X and StdDev(21,34,43,12)=Y can we calculate StdDev(11,14,16,17,21,34,43,12) from X & Y Long Version: I am designing a star schema. The schema has a fact_table (grain=transaction) which stores individual transaction response_time. The schema also has an aggregate_table (grain=day) which stores the response_time_sum per day. In my report I need to calculate standard deviations of the response time for a given

4-5-4 National Retail foundation Calendar csv download or function to create

ⅰ亾dé卋堺 提交于 2019-12-05 00:58:13
问题 I've been googling all over the place and haven't found this. The retail client I'm working for using the NRFretail calendar. NRF site Calendars I'm wondering if anyone has ever created a lookup/dimension table with these values. Thanks, 回答1: You can find a perl module that can generate a Retail 4-5-4 calendar for any year on CPAN: http://metacpan.org/pod/DateTime::Fiscal::Retail454 It was written specifically for this problem. 回答2: An algorithmic option I've used in the past was (I'm

Oracle aggregation function to allocate amount

江枫思渺然 提交于 2019-12-04 19:43:10
问题 Suppose I have 2 tables T1 and T2 as follows T1 : bag_id bag_type capacity ------|--------|-------- 1 A 500 2 A 300 3 A 100 4 B 200 5 B 100 T2 : item_type item_amount ---------|----------- A 850 B 300 Each record in table T1 represents a bag and its capacity, here I have 5 bags. I want to write an SQL that allocate items in table T2 into each bag with the same type, i.e. the result should be like this bag_id bag_type capacity allocated_amount ------|--------|--------|---------------- 1 A 500

MDX ignoring Excel filter

牧云@^-^@ 提交于 2019-12-04 15:33:58
I'm just starting to get my head around MDX and I'm having trouble with a calculated member. I'm using the following MDX: IIF( ISEMPTY((Axis(1).Item(0).Item(0).Dimension.CurrentMember, [Measures].[Qty])) ,NULL ,([Product].[Product Code].CurrentMember.Parent, [Measures].[Qty]) ) What I'm trying to do is get a total quantity of the group of products displayed in a cube. I then use that total to divide by each product's quantity to get a "percent of total" measure. The above MDX does correctly return the total quantity of products displayed in any dimension. However, when a user in Excel changes

What are the types of dimension tables in star schema design? [closed]

為{幸葍}努か 提交于 2019-12-04 15:15:15
When reading about star schema design I have seen that many people uses various names for different types of dimension tables. Please list the names and a small description of each type. If any list also an alias name. I have come across these types of dimension tables so far: Regular dimension Standard star dimension. Time Dimension A special case of the standard star dimension. Parent-child dimension Used to model hierarchical structures, fx BOM (bill of materials). Snowflake dimension Can also be used to model hierarchical structures. Degenerate dimensions When the dimension attribute is

Informatica writes rejected rows into a bad file, how to avoid that?

吃可爱长大的小学妹 提交于 2019-12-04 14:43:57
I have developed an Informatica PowerDesigner 9.1 ETL Job which uses lookup and an update transform to detect if the target table has the the incoming rows from the source or not. I have set for the Update transform a condition IIF(ISNULL(target_table_surrogate_id), DD_INSERT, DD_REJECT) Now, when the incoming row is already in the target table, the row is rejected. Informatica writes these rejected rows into a .bad file. How to prevent this? Is there a way to determine that the rejected rows are not written into a .bad file? Or should I use e.g. a router insted of an update transform to

Data warehouse for AD dates

五迷三道 提交于 2019-12-04 14:34:38
问题 We're creating a historic archive for a world history database and we need a date lookup table which references all dates in AD. How to go about creating the values for this table - from 1AD to 2011 as YYYY/MM/DD? Database is MySQL. Problems: I'm using Excel to pre-populate the dates, then import into MySQL as: YYYY/MM/DD but Excel doesn't recognize years like 0007, 0008, etc so I can't auto-copy cells to generate dates. I have to manually do it and this will take days to go from 1AD to year

Slowly changing dimensions- SCD1 and SCD2 implementation in Hive

不问归期 提交于 2019-12-04 12:37:22
I am looking for SCD1 and SCD2 implementation in Hive (1.2.1). I am aware of the workaround to load SCD1 and SCD2 tables prior to Hive (0.14). Here is the link for loading SCD1 and SCD2 with the workaround approach http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/ Now that Hive supports ACID operations just want to know if there is a better or direct way of loading it. As HDFS is immutable storage it could be argued that versioning data and keeping history (SCD2) should be the default behaviour for loading dimensions. You can create a View in your Hadoop SQL query engine

Where I can download sample database which can be used as data warehouse? [closed]

坚强是说给别人听的谎言 提交于 2019-12-04 09:39:44
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . Where I can download sample database which can be used for data warehouse creation? It should't be sample from Microsoft (Northwind etc.). EDIT: Sorry for not clarifying my question. At my university we have class where we must create some data warehouse and since Northwind is so popular over net then professor