data-warehouse

Redshift Performance of Flat Tables Vs Dimension and Facts

天大地大妈咪最大 提交于 2019-12-18 12:40:09
问题 I am trying to create dimensional model on a flat OLTP tables (not in 3NF). There are people who are thinking dimensional model table is not required because most of the data for the report present single table. But that table contains more than what we need like 300 columns. Should I still separate flat table into dimensions and facts or just use the flat tables directly in the reports. 回答1: When creating tables purely for reporting purposes (as is typical in a Data Warehouse), it is

Database choice for large data volume?

↘锁芯ラ 提交于 2019-12-18 10:05:28
问题 I'm about to start a new project which should have a rather large database. The number of tables will not be large (<15), majority of data (99%) will be contained in one big table, which is almost insert/read only (no updates). The estimated amount of data in that one table is going to grow at 500.000 records a day , and we should keep at least 1 year of them to be able to do various reports. There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading

Why NULL values are mapped as 0 in Fact tables?

非 Y 不嫁゛ 提交于 2019-12-18 08:23:15
问题 What is the reason that in measure fields in fact tables (dimensionally modeled data warehouses) NULL values are usually mapped as 0? 回答1: Although you've already accepted another answer, I would say that using NULL is actually a better choice, for a couple of reasons. The first reason is that aggregates return the 'correct' answer (i.e. the one that users tend to expect) when NULL is present but give the 'wrong' answer when you use zero. Consider the results from AVG() in these two queries:

Star schema, normalized dimensions, denormalized hierarchy level keys

我只是一个虾纸丫 提交于 2019-12-17 23:43:41
问题 Given the following star schema tables. fact, two dimensions, two measures. # geog_abb time_date amount value #1: AL 2013-03-26 55.57 9113.3898 #2: CO 2011-06-28 19.25 9846.6468 #3: MI 2012-05-15 94.87 4762.5398 #4: SC 2013-01-22 29.84 649.7681 #5: ND 2014-12-03 37.05 6419.0224 geography dimension, single hierarchy, 3 levels in hierarchy. # geog_abb geog_name geog_division_name geog_region_name #1: AK Alaska Pacific West #2: AL Alabama East South Central South #3: AR Arkansas West South

Schema evolution in parquet format

落花浮王杯 提交于 2019-12-17 17:49:06
问题 Currently we are using Avro data format in production. Out of several good points using Avro, we know that it is good in schema evolution. Now we are evaluating Parquet format because of its efficiency while reading random columns. So before moving forward our concern is still schema evolution . Does anyone know if schema evolution is possible in parquet, if yes How is it possible, if no then Why not. Some resources claim that it is possible but it can only add columns at end . What does this

Calendar table for Data Warehouse

≯℡__Kan透↙ 提交于 2019-12-17 04:34:08
问题 For my data warehouse, I am creating a calendar table as follows: SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek Integer NOT NULL, NameOfDay VarChar (10) NOT NULL, NameOfMonth VarChar (10) NOT NULL, WeekOfYear Integer NOT NULL, JulianDay Integer NOT NULL, USAIsBankHoliday Bit NOT NULL, USADayName VarChar (100) NULL, ) ALTER TABLE dbo.Calendar ADD CONSTRAINT DF_Calendar_USAIsBankHoliday DEFAULT 0 FOR

Relationship Between Two Dimensions in SSAS

 ̄綄美尐妖づ 提交于 2019-12-14 04:05:45
问题 I am developing an ssas database and have snowflaked dimensions to which it has links. For example I have a customer dimension table, distributor dimension table and a territory dimension table in which there is a relationship to the latter from the other two. Therefore I can illustrate the relationships as follows: Retailer <-- Territory Distributor <-- Territory In a specific cube in the database, I have measures where all the three dimensions mentioned above have relationships to. As far

SSIS - ETL - Transfer tables/databases from many servers?

淺唱寂寞╮ 提交于 2019-12-13 17:26:43
问题 I have 6-7 identical databases (almost). I want to copy the data from some of the tables of EACH of these servers into the corresponding table of ONE server. That is, multiple sources and one destination server. All the servers have different IPs. How do I do this task ? Would for loop be appropriate for this. If yes, then what would be a good way to do it ? I might perform a bit of Transform. Not sure as of now. To be safe, I want to use SSIS. 回答1: Here is an overview of how you can set up a

How can I fetch (via GET) all JIRA issues? Do I go to the Search node?

走远了吗. 提交于 2019-12-13 13:38:04
问题 It looks like /api/2/project easily returns all projects in a JIRA instance in JSON format. I'd like to do the same for issues, but this does not appear to exist. Is /api/2/search the standard way to do a mass-dump like this? And what is the best way to regularly update this to a database? Would I do something like search (update date > [last entry in database]) and then go through the pagination? Surely I can't be the first person attempting this, though I see no similar guide anywhere

how to get Task id,Feature id,Complete hrs by accessing TFS Warehouse by date in VS 2017?

时光怂恿深爱的人放手 提交于 2019-12-13 10:33:02
问题 how to get Task,Feature id,completed hours by date SQL SERVER QUERY . lets say there is a task 123 in which was created on a sprint which start date is 1st July(1.1.2018) and end at 10th July(10.7.2018) task 123 effort hours is 5 hrs. completed hrs is 0 and renaming hours is 5 hrs on 1-7-2018 <br/> and on 5th July effort is 5 hrs completed 2 hrs and Renaming hours is 3 hrs <br/> and on 10th July effort is 5 hr and completed is 4 hrs and Remaining hours us 1 hr <br/> so how can i find task id