data-warehouse | 易学教程

Identifying parent records for many transactions

阅读更多关于 Identifying parent records for many transactions

问题 This is related to a question I asked previously for which lag/lead was suggested. However the data I'm working with are more complex than I first thought so I need a more robust solution. This screen shot shows an issue I need to tackle: Within a single serial number, a shipment event defines a new reference window. So records 2,3,4 relate to 1. Record 6 relates to 5 and so forth. I need to mark the records for which the BillToId doesn't match the parent shipment. I'm trying to understand if

Java/.NET Developer moving towards Data Warehouse

阅读更多关于 Java/.NET Developer moving towards Data Warehouse

问题 I understand the concept of a Data Warehouse after reading questions like this: What is a data warehouse?. I am familiar with OLAP and MDX (MDX to a limited extent). I have a .NET application that connects to about fifteen different databases to search for information and also to manage information i.e. it is a Java application that connects to fifteen databases that are Oracle/SQL based. I believe a Data Warehouse would meet my needs. I have two questions about Data Warehouses: Do you copy

Oracle Warehouse Builder Design center can't start

阅读更多关于 Oracle Warehouse Builder Design center can't start

问题 i'm using Orcle Database 11g Enreprise edition Release 11.2.0.1.0 64bit, i want to use warehouse builder wich is included in it, but the problem is that i can't open Design center, it gives me the following error. application could not start correctly (0xc0000018) could anyone help me with this, i can't find a solution 回答1: I finally find a solution to this problem, when you download oracle database 64bit, the warehouse builder is included with it, but the design center is working only with

One to one relationship in data warehouse

阅读更多关于 One to one relationship in data warehouse

问题 Simple scenario: I'd like to create data warehouse which information about "issues" (cost, wroking time etc.). issue also has status which might change over time. So then i'm creating fact table called issueRealization decribing each issue. My question is: should i create "issue" dimension which will give me one to one relationship beetwen dimension and fact table? Or i should divide Issue dimension to smallest dimension like status etc? 回答1: Issue status tracking is a good case to use an

Thoughts on dimension measures for BI

阅读更多关于 Thoughts on dimension measures for BI

问题 I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table. I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc. I'm interested in how others have dealt with this situation. We

Need help understanding alternatives to scd in SSIS

阅读更多关于 Need help understanding alternatives to scd in SSIS

问题 I am working on a data warehouse project that will involve integrating data from multiple source systems. I have set up an SSIS package that populates the customer dimension and uses the slowly changing dimension tool to keep track of updates to the customer. I'm running into some issues. Take this example: Source system A might have a record like that looks like this: First Name, Last Name, Zipcode Jane, Doe, 14222 Source system B might have a record for the same client that looks like this:

Data vault model: what are hubs good for?

阅读更多关于 Data vault model: what are hubs good for?

问题 I was just reading about Data Vault modeling and as far as I understand it, the hub does only contain keys (and the record source). So I was wondering why I should create those hub tables, only to store the record source? Wouldn't it be enough to have only Satellites and Links? Btw: I'm looking for simple mysql tables in a data vault form to download and play with. 回答1: The hub is where the passive integration of multiple sources is applied. You would have a column for data source and record

Fact Table with Different Update Schedules

阅读更多关于 Fact Table with Different Update Schedules

问题 I have two sets of data with the same level of grainularity, for example invoice number. Most of the data required is updated daily as we recognize the revenue for previous invoices. However, some of this data is fed through a seperate costing system once a month and is then fed to the data warehouse with additional information. Should I create one fact table that contains both sets of data, and then run an update on the fact table once a month when the other data is imported in, or should I

Many-To-Many dimensional model

阅读更多关于 Many-To-Many dimensional model

问题 Folks, I have a dimension table called DIM_FILE which holds information of the files we received from customers. Each file has detail records which constitutes my FACT table, CUST_DETAIL. In the main process, file is gone through several stages and each stage tags a status to it. Long in a short, I have many-to-many relationship. Any ideas around star schema dimensional modeling. A customer record only belong to a single file and a file can have multiple statuses. FACT ---- CustID FileID

Inmon data Marts vs Kimball data marts

阅读更多关于 Inmon data Marts vs Kimball data marts

问题 Is the only difference between kimball and inmon, the Enterprise layer(EDW). I was googling around and found out that inmon also creates data marts using EDW. so does that mean, both these data marts are similar in structure for a given business process and source systems ? Once the data marts are readily available for both the procedures, do they give same performance ? correct me if i am wrong, the data warehouse is created first and then dimensional model is created on top of it for