data-warehouse

Fact table with information that is regularly updatable in source system

孤者浪人 提交于 2019-12-10 20:03:27
问题 I'm building a dimensional data warehouse and learning how to model my various business processes from my source system in my warehouse. I'm currently modelling a "Bid" (bid for work) from our source system in our data warehouse as a fact table which contains information such as: Bid amount Projected revenue Sales employee Bid status (active, pending, rejected, etc) etc. The problem is that the bid (or most any other process I'm trying to model) can go through various states and have its

Group by vs Partition by in Oracle

岁酱吖の 提交于 2019-12-09 11:47:46
问题 I am writing a query to fetch records from a Oracle warehouse. Its a simple Select Query with joins on few tables and i have few columns to be aggregated. Hence i end up using Groupby on rest of the columns. Say I am picking some 10 columns and out of which 5 is aggregate columns. so i need to group by on the other 5 columns. I can even achieve the same by not doing a Groupby and using over (paritition by) clause on the each each aggregate column i want to derive. I am not sure which is

How Do I aggregate Data By Day and Still Respect Timezone?

最后都变了- 提交于 2019-12-09 09:25:24
问题 We are currently using a summary table that aggregates information for our users on an hourly basis in UTC time. The problem we are having is that this table is becoming too large and slowing our system down immensely. We have done all the tuning techniques recommended for PostgreSQL and we are still experiencing slowness. Our idea was to start aggregating by day rather than by hour, but the problem is that we allow our customers to change the timezone, which recalculates the data for that

Why primary key is (not) required on fact table in dimensional modelling?

China☆狼群 提交于 2019-12-09 04:26:20
问题 I have heard a few references that pk is not required on fact table. I believe every single table should have a pk. How could a person understand a row in a fact table if there is no pk and 10+ foreign keys. 回答1: Primary Key is there ... but Enforcing the primary key constraint in database level is not required. If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of

NoSQL for filesystem storage organization and replication?

余生长醉 提交于 2019-12-08 18:14:31
We've been discussing design of a data warehouse strategy within our group for meeting testing, reproducibility, and data syncing requirements. One of the suggested ideas is to adapt a NoSQL approach using an existing tool rather than try to re-implement a whole lot of the same on a file system. I don't know if a NoSQL approach is even the best approach to what we're trying to accomplish but perhaps if I describe what we need/want you all can help. Most of our files are large, 50+ Gig in size, held in a proprietary, third-party format. We need to be able to access each file by a name/date

How to connect a fact and dimension table that are in 1-N relationship

大兔子大兔子 提交于 2019-12-08 15:17:36
I have a Purchase FactTable with some measures and dimension keys. Then, there's another another table: Discount Table. Purchase FactTable is in a 1-N relationship with Discount Table (for each purchase I might have bought several discounted items). Discount table has some attributes (description, note) and some numeric values (for example: discount in $) that I would like to roll-up. If I create a dimension out of this Discount Table, I'll get a wrong number of purchase counts in a sum count (inflated, one row for every discounted item). If I create a separate fact out of this Discount Table,

In pentaho..How to pass a text file which contains all the definition of the connection parameters in the job?

偶尔善良 提交于 2019-12-08 06:22:39
问题 I am using jdbc connection and i am passing parameters with example ${sample_db_connection} and that parameters has been defined in server in a text file as sample_db_connection=localhost and i want to pass the text file in the job step so that whenever the job ran and it found this parameter ,automatically it will take the value defined in text file. 回答1: You need to create a KTR file using " Property Input " as the input step and " Modified Java Script " Step to define the key value mapping

NoSQL for filesystem storage organization and replication?

喜欢而已 提交于 2019-12-08 04:18:40
问题 We've been discussing design of a data warehouse strategy within our group for meeting testing, reproducibility, and data syncing requirements. One of the suggested ideas is to adapt a NoSQL approach using an existing tool rather than try to re-implement a whole lot of the same on a file system. I don't know if a NoSQL approach is even the best approach to what we're trying to accomplish but perhaps if I describe what we need/want you all can help. Most of our files are large, 50+ Gig in size

How can I get the total run time of a query in redshift, with a query?

 ̄綄美尐妖づ 提交于 2019-12-08 02:56:01
问题 I'm in the process of benchmarking some queries in redshift so that I can say something intelligent about changes I've made to a table, such as adding encodings and running a vacuum. I can query the stl_query table with a LIKE clause to find the queries I'm interested in, so I have the query id, but tables/views like stv_query_summary are much too granular and I'm not sure how to generate the summarization I need! The gui dashboard shows the metrics I'm interested in, but the format is

How to handle Bridge table in Star Schema

点点圈 提交于 2019-12-08 00:31:25
问题 I am trying to build a star schema from an E/R diagram (OLTP system) that seems to contain a bridge table. Order is an obvious fact-table and product a dimension-table. I can't see how I can keep the bridge table if the model needs to be a star schema. How would you tackle this relationship if I need to keep information about Channel in the model? 回答1: It depends on how you plan to use the model. If you only need to answer product and channel questions about existing orders, then you can