data-warehouse | 易学教程

Informatica writes rejected rows into a bad file, how to avoid that?

阅读更多关于 Informatica writes rejected rows into a bad file, how to avoid that?

问题 I have developed an Informatica PowerDesigner 9.1 ETL Job which uses lookup and an update transform to detect if the target table has the the incoming rows from the source or not. I have set for the Update transform a condition IIF(ISNULL(target_table_surrogate_id), DD_INSERT, DD_REJECT) Now, when the incoming row is already in the target table, the row is rejected. Informatica writes these rejected rows into a .bad file. How to prevent this? Is there a way to determine that the rejected rows

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

阅读更多关于 PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

问题 Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to

What should I have in mind when building OLAP solution from scratch?

阅读更多关于 What should I have in mind when building OLAP solution from scratch?

问题 I'm working for a company running a software product based on a MS SQL database server, and through the years I have developed 20-30 quite advanced reports in PHP, taking data directly from the database. This has been very successful, and people are happy with it. But it has some drawbacks: For new changes, it can be quite development intensive The user can't experiment much with the data - it is locked to a hard-coded view It can be slow for big reports I am considering gradually going to a

Benefits of using Staging Database while designing Data Warehouse

阅读更多关于 Benefits of using Staging Database while designing Data Warehouse

问题 I am in process of designing a Data Warehouse Architecture. While exploring various options to Extract data from Production and putting into Data Warehouse, I came across many articles which mainly suggested following two approaches - Production DB ----> Data Warehouse (Star Schema) ----> OLAP Cube Production DB ----> Staging Database ----> Data Warehouse (Star Schema) ----> OLAP Cube I am still not sure which one is the better approach in terms of Performance and reducing processing load on

NoSql and Data-Warehouse

阅读更多关于 NoSql and Data-Warehouse

问题 What are the relations between NoSql and Data-Warehouse technologies/theories? What concepts they share? What are the basic differences between them? How do you think each could be benefits/enriches from the other? I think your ideas should be helpful for the future of both technologies. UPDATE : Some useful links: Integrating NoSQL in the Data Warehouse NoSQL and Data Warehousing Are You Ready for Big Data? 2nd UPDATE: MongoDB, BI and Non-Relational Databases 回答1: Data Warehouses have very

Move SQL Server Database data to SAP BW

阅读更多关于 Move SQL Server Database data to SAP BW

问题 I have read a few articles about moving data out of SAP BW and into SQL Server. I cant find any articles on moving the data from SQL Server to SAP BW, is it even possible and if so what would be the best way to handle this? 回答1: After searching on this topic, i found many link addressing this issue, in this answer i will try to summarize them all and to provide all links that can help you achieving your goal. There are many way to import data from SQL Server into SAP BW: (1) SAP BW DB Connect

Using a DATE field as primary key of a date dimension with MySQL

阅读更多关于 Using a DATE field as primary key of a date dimension with MySQL

问题 I want to handle a date dimension in a MySQL datawarehouse. (I m a newbie in the DW world) I made some searches with google and saw a lot of table structures (most of) date dimension where the Primary Key is a simple UNSIGNED INTEGER . Why don't use a DATE field as primary key since with MySQL it is 3 Bytes VS 4 Bytes for INTEGER ? Ex: CREATE TABLE dimDate id INTEGER UNSIGNED NOT NULL PRIMARY AUTOI_NCREMENT, date DATE NOT NULL, dayOfWeek ... VS CREATE TABLE dimDate date DATE NOT NULL PRIMARY,

Using a DATE field as primary key of a date dimension with MySQL

阅读更多关于 Using a DATE field as primary key of a date dimension with MySQL

Is it possible to partially refresh a materialized view in Oracle?

阅读更多关于 Is it possible to partially refresh a materialized view in Oracle?

问题 I have a very complex Oracle view based on other materialized views, regular views as well as some tables (I can't "fast refresh" it). Most of the time, existing records in this view are based on a date and are "stable", with new record sets having new dates. Occasionally, I receive back-dates. I know what those are and how to deal with them if I were maintaining a table, but I would like to keep this a "view". A complete refresh would take around 30 minutes, but it only takes 25 seconds for

In a star schema, are foreign key constraints between facts and dimensions neccessary?

阅读更多关于 In a star schema, are foreign key constraints between facts and dimensions neccessary?

问题 I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star schema. In traditional applications I’m used to having them, but I started to wonder if they were needed in this case. I’m currently working in a SQL Server 2005 environment. UPDATE: For those interested I came across a poll asking the same question. 回答1