data-warehouse

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

早过忘川 提交于 2019-12-02 23:09:37
Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it into a data-warehouse. How much data? At peak times we are talking approx 80-100k rows per min hitting the OLTP side, off-peak this will drop significantly to 15-20k. The most frequently updated rows are ~64 bytes each but there are various tables etc so the data

What is best practice for representing time intervals in a data warehouse?

白昼怎懂夜的黑 提交于 2019-12-02 21:07:22
In particular I am dealing with a Type 2 Slowly Changing Dimension and need to represent the time interval a particular record was active for, i.e. for each record I have a StartDate and an EndDate . My question is around whether to use a closed ( [StartDate,EndDate] ) or half open ( [StartDate,EndDate) ) interval to represent this, i.e. whether to include the last date in the interval or not. To take a concrete example, say record 1 was active from day 1 to day 5 and from day 6 onwards record 2 became active. Do I make the EndDate for record 1 equal to 5 or 6? Recently I have come around to

Good place to start learning data warehousing? [closed]

岁酱吖の 提交于 2019-12-02 19:41:48
Closed. This question is off-topic. It is not currently accepting answers. Learn more . Want to improve this question? Update the question so it's on-topic for Stack Overflow. I am interested in learning more about data warehousing. I see terms like "dimension", "snowflake schema" and "star schema" thrown about. Where would one start in learning about this stuff? Are there good books or Internet resources? ETL is in this space too right? Stephen Denne Wikipedia's resources on Data Warehousing are good. Reading any of Ralph Kimball 's books, such as " The Data Warehouse Toolkit: The Complete

What should I have in mind when building OLAP solution from scratch?

一曲冷凌霜 提交于 2019-12-02 19:34:11
I'm working for a company running a software product based on a MS SQL database server, and through the years I have developed 20-30 quite advanced reports in PHP, taking data directly from the database. This has been very successful, and people are happy with it. But it has some drawbacks: For new changes, it can be quite development intensive The user can't experiment much with the data - it is locked to a hard-coded view It can be slow for big reports I am considering gradually going to a OLAP-based approach, which can be queried from Excel or some web-based service. But I would like to do

Benefits of using Staging Database while designing Data Warehouse

流过昼夜 提交于 2019-12-02 18:23:16
I am in process of designing a Data Warehouse Architecture. While exploring various options to Extract data from Production and putting into Data Warehouse, I came across many articles which mainly suggested following two approaches - Production DB ----> Data Warehouse (Star Schema) ----> OLAP Cube Production DB ----> Staging Database ----> Data Warehouse (Star Schema) ----> OLAP Cube I am still not sure which one is the better approach in terms of Performance and reducing processing load on Production database. Which approach you find better while designing Data Warehouse ? Below points are

Time and date dimension in data warehouse

江枫思渺然 提交于 2019-12-02 18:21:57
I'm building a data warehouse. Each fact has it's timestamp . I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables. (source: etl-tools.info ) But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL . What are your opinions/solutions ? (I'm using Infobright) My guess is that it depends on your reporting requirement. If you need need something like WHERE "Hour" = 10

What are the open source tools and techniques to build a complete data warehouse platform? [closed]

不羁的心 提交于 2019-12-02 15:40:10
I'm looking for these open source tools possibly free or with free trial version to set up complete data warehouse stack. I know about few like Pentaho open source Mondrian server, but couldn't get any google result to setup complete platform. I'm not sure whether these components are compatible with each other? Could someone please list them along with their position in the chain? Pascal Thivent The Open Source Data Warehousing does a great job at identifying OSS components that could be used to build a Data Warehouse stack: Infrastructure (servers, OS, databases), Integration Management (ETL

NoSql and Data-Warehouse

半腔热情 提交于 2019-12-02 14:10:21
What are the relations between NoSql and Data-Warehouse technologies/theories? What concepts they share? What are the basic differences between them? How do you think each could be benefits/enriches from the other? I think your ideas should be helpful for the future of both technologies. UPDATE : Some useful links: Integrating NoSQL in the Data Warehouse NoSQL and Data Warehousing Are You Ready for Big Data? 2nd UPDATE: MongoDB, BI and Non-Relational Databases Cade Roux Data Warehouses have very little in common with NoSQL - the main similarity is that any two data warehouses can have very

What is the difference between a database and a data warehouse?

淺唱寂寞╮ 提交于 2019-12-02 13:49:19
What is the difference between a database and a data warehouse? Aren't they the same thing, or at least written in the same thing (ie. Oracle RDBMS)? TheCloudlessSky Check out this for more information. From a previous link: Database Used for Online Transactional Processing ( OLTP ) but can be used for other purposes such as Data Warehousing. This records the data from the user for history. The tables and joins are complex since they are normalized (for RDMS ). This is done to reduce redundant data and to save storage space. Entity – Relational modeling techniques are used for RDMS database

How to pivot data using Informatica when you have variable amount of pivot rows?

冷暖自知 提交于 2019-12-02 06:45:59
问题 Based on my earlier questions, how can I pivot data using Informatica PowerCenter Designer when I have variable amount of Addresses in my data. I would like to Pivot e.g four addresses from my data. This is the structure of the source data file: +---------+--------------+-----------------+ | ADDR_ID | NAME | ADDRESS | +---------+--------------+-----------------+ | 1 | John Smith | JohnsAddress1 | | 1 | John Smith | JohnsAddress2 | | 1 | John Smith | JohnsAddress3 | | 2 | Adrian Smith |