data-warehouse

Good place to start learning data warehousing? [closed]

旧时模样 提交于 2019-12-04 08:07:57
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . I am interested in learning more about data warehousing. I see terms like "dimension", "snowflake schema" and "star schema" thrown about. Where would one start in learning about this stuff? Are there good books or Internet resources? ETL is in this space too right? 回答1: Wikipedia's resources on Data Warehousing

Labor Day Vs. Thanksgiving

跟風遠走 提交于 2019-12-04 07:50:30
I am creating a calendar table for my warehouse. I will use this as a foreign key for all the date fields. The code shown below creates the table and populates it. I was able to figure out how to find Memorial Day (last Monday of May) and Labor Day (first Monday of September). SET NOCOUNT ON DROP Table dbo.Calendar GO Create Table dbo.Calendar ( CalendarId Integer NOT NULL, DateValue Date NOT NULL, DayNumberOfWeek Integer NOT NULL, NameOfDay VarChar (10) NOT NULL, NameOfMonth VarChar (10) NOT NULL, WeekOfYear Integer NOT NULL, JulianDay Integer NOT NULL, USAIsBankHoliday Bit NOT NULL,

What are the open source tools and techniques to build a complete data warehouse platform? [closed]

为君一笑 提交于 2019-12-04 07:39:45
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I'm looking for these open source tools possibly free or with free trial version to set up complete data warehouse stack. I know about few like Pentaho open source Mondrian server, but couldn't get any google result to setup complete platform. I'm not sure whether these components are compatible with each other?

Handling multiple fact tables in Qlikview

99封情书 提交于 2019-12-04 01:24:20
问题 I have a PostgreSQL database containing various education data such school-level test scores and enrollment figures. I need to separate enrollment from test scores because the data is on different grains. Even though enrollment is on a different granularity from the test-score data, many of the dimensions are the same. For example, I have: ~ ---------------------------------------------------------------------------------~ | Test Scores Fact | |-------------|-----------|----------|-----------

Strategies for populating a Reporting/Data Warehouse database

核能气质少年 提交于 2019-12-03 21:43:36
For our reporting application, we have a process that aggregates several databases into a single 'reporting' database on a nightly basis. The schema of the reporting database is quite different than that of the separate 'production' databases that we are aggregating so there is a good amount of business logic that goes into how the data is aggregated. Right now this process is implemented by several stored procedures that run nightly. As we add more details to the reporting database the logic in the stored procedures keeps growing more fragile and unmanageable. What are some other strategies

Adding/Combining Standard Deviations

可紊 提交于 2019-12-03 21:37:05
Short Version: Can StdDevs be added/combined? i.e. if StdDev(11,14,16,17)=X and StdDev(21,34,43,12)=Y can we calculate StdDev(11,14,16,17,21,34,43,12) from X & Y Long Version: I am designing a star schema. The schema has a fact_table (grain=transaction) which stores individual transaction response_time. The schema also has an aggregate_table (grain=day) which stores the response_time_sum per day. In my report I need to calculate standard deviations of the response time for a given timedimension, say day, week, month etc. How can I calculate the StandardDeviation using the aggregate_table

Microsoft Azure Data warehouse and SqlAlchemy

徘徊边缘 提交于 2019-12-03 20:16:07
I am trying to use python's sqlalchemy library for connecting to microsoft azure data warehouse. and receiving the following error: (pyodbc.Error) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver][SQL Server]Client driver version is not supported. (46722) (SQLDriverConnect); [HY000] [Microsoft][ODBC SQL Server Driver][SQL Server]Client driver version is not supported. (46722)') my code for windows connection: import sqlalchemy user_name = 'userName' password = 'password' uri = 'sqlServerName' db_name = 'SQLDBName' db_prefix = 'mssql+pyodbc://' db_driver = '{SQL Server}' connection_string

Calendar tables in PostgreSQL 9

北慕城南 提交于 2019-12-03 18:01:51
问题 I am building an analytics database (I have a firm understanding of the data and the business objectives and only basic-to-moderate database skills). I have come across some references to building similar warehouses which implement the concept of 'calendar tables'. This makes sense and is easily enough done. Most examples I see, however, are calendar tables that limit scope to 'day'. My data will need to be analyzed down to hour-level. Possibly minutes. My question: would an implementation of

How to create history fact table?

扶醉桌前 提交于 2019-12-03 17:09:13
问题 I have some entities in my Data Warehouse: Person - with attributes personId, dateFrom, dateTo, and others those can be changed, e.g. last name, birth date and so on - slowly changing dimension Document - documentId, number, type Address - addressId, city, street, house, flat The relations between (Person and Document) is One-To-Many and (Person and Address) is Many-To-Many. My target is to create history fact table that can answer us following questions: What persons with what documents

4-5-4 National Retail foundation Calendar csv download or function to create

半城伤御伤魂 提交于 2019-12-03 16:32:39
I've been googling all over the place and haven't found this. The retail client I'm working for using the NRFretail calendar. NRF site Calendars I'm wondering if anyone has ever created a lookup/dimension table with these values. Thanks, Jim You can find a perl module that can generate a Retail 4-5-4 calendar for any year on CPAN: http://metacpan.org/pod/DateTime::Fiscal::Retail454 It was written specifically for this problem. An algorithmic option I've used in the past was (I'm paraphrasing as I did it in Excel): From the date, figure out the weeknum (in the range 1 to 53) From the weeknum,