data-warehouse

MERGE - conditional “WHEN MATCHED THEN UPDATE”

梦想的初衷 提交于 2021-02-17 21:14:10
问题 The highlights in the image below shows the logic I want to implement. I realize the syntax is incorrect. Is there a way to conditionally update a record in a MERGE statement only if it the value of one of its columns in the target table is NULL, and the corresponding value in the source table is not null? How would you suggest re-writing this? MERGE dbo.input_311 AS [t] USING dbo.input_311_staging AS [s] ON ([t].[unique key] = [s].[unique key]) WHEN NOT MATCHED BY TARGET THEN INSERT(t.

Data Warehousing - Star Schema vs Flat Table

我们两清 提交于 2021-02-17 18:58:45
问题 I'm trying to design a Data Warehouse for a single store of commonly required data ranging from finance systems, project scheduling systems and a myriad of scientific systems. I.e. many different data marts. I have been reading up on Data Warehousing and popular methods such as Star Schemas and Kimball methods etc but one question I cannot find answer to is: Why is it better to design your DW Data Mart as a star schema rather than a single flat table? Surely having no joins between facts and

SCD 1 dimension without surrogate key

旧城冷巷雨未停 提交于 2021-02-11 14:49:12
问题 This reference to Kimball group state that all dimensions should have surrogate keys except some very predictable one like date diemnsion. I have exactly the same case as described at SCD Type 1 Wiki page: Technically, the surrogate key is not necessary, since the row will be unique by the natural key (Supplier_Code). Data are loaded from operational system without surrogate key, while I calculating surrogate key in ETL based on single and unique xxx_code column. SCD Type 1, full load. Are

GDPR: encyption at-rest instead of data lookup tables [closed]

情到浓时终转凉″ 提交于 2021-02-11 14:44:58
问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 12 days ago . Improve this question Encryption at-rest - is storing data inside your storage/database in encrypted format. During processing you need to decrypt data every time, calculate something and then encrypt everything back (encryption is managed by storage). Does encryption at-rest resolve

bigquery aggregate for daily basis

为君一笑 提交于 2021-02-10 17:30:36
问题 I have a table in big-query (datawarehouse): and I would like to have the result of: Here is the explanation on how the calculation should be: 2017-10-01 = $100 is obvious, because the data is only one 2017-10-02 = $400 is a sum of the first row and third row. Why? Because second row and third row have the same invoice. So we only use the latest update. 2017-10-04 = $800 is a sum of row 1,3, and 4. Why? It is because we only take one invoice only per day. row 1 (T001), row 3(T002), row 4(T003

Data warehouse schema: is it OK to directly link fact tables in DWH?

半世苍凉 提交于 2021-01-27 16:45:05
问题 Is it OK to directly link fact tables in DWH? As I understand, in galaxy schema fact tables are not linked, they just have common dimension table. But, if there is a DWH schema that assumes to link them directly? 回答1: IMO, they shouldn’t, even if they can. Fact tables are usually huge, with potentially many billions of rows, and hold measures at a certain grain. Linking two or more fact tables may require joining several multi billion row tables which will be too expensive. If you need to

Managing surrogate keys in a data warehouse

▼魔方 西西 提交于 2020-12-25 04:57:20
问题 I want to build a data warehouse, and I want to use surrogate keys as primary keys for my fact tables. But the problem is that in my case fact tables should be updated. The first question is how do I find a corresponding auto-generated surrogate key for the natural key in the source system? I have seen some answers mentioning lookup tables which store correspondence between natural and surrogate keys, but I didn't understand how exactly they are implemented. Where this table should be stored:

collecting annual aggregated data for later quick access

流过昼夜 提交于 2020-12-15 20:04:56
问题 I have a number of sql queries which take year as a parameter and generate various annual reports for the given year. Those queries are quite cumbersome and take a considerable amount of time to execute (20 min - 40 min). In order to give my users the ability to view annual report whenever they need to, I am considering to pre-execute these queries and store the results for later use. One solution would be to schedule execution of these queries and insert the results in some temp tables. But

collecting annual aggregated data for later quick access

偶尔善良 提交于 2020-12-15 19:40:19
问题 I have a number of sql queries which take year as a parameter and generate various annual reports for the given year. Those queries are quite cumbersome and take a considerable amount of time to execute (20 min - 40 min). In order to give my users the ability to view annual report whenever they need to, I am considering to pre-execute these queries and store the results for later use. One solution would be to schedule execution of these queries and insert the results in some temp tables. But

collecting annual aggregated data for later quick access

可紊 提交于 2020-12-15 19:40:16
问题 I have a number of sql queries which take year as a parameter and generate various annual reports for the given year. Those queries are quite cumbersome and take a considerable amount of time to execute (20 min - 40 min). In order to give my users the ability to view annual report whenever they need to, I am considering to pre-execute these queries and store the results for later use. One solution would be to schedule execution of these queries and insert the results in some temp tables. But