data-warehouse | 易学教程

How to pivot row data using Informatica?

阅读更多关于 How to pivot row data using Informatica?

filter length of time [duplicate]

阅读更多关于 filter length of time [duplicate]

问题 This question already has answers here : Calculate time difference (only working hours) in minutes between two dates (5 answers) Closed 5 years ago . I need to calculate the time that is between 8AM and 10PM. Other time I dont need. and now I do this by excel, and I want to do automate the process I have the following table, events start_date | end_date | duration_REAL | duration_08AM_a_10PM ------------------------------------------------------------------------------- 08:00AM 20-05-2014 |

In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

阅读更多关于 In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

问题 I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This was good enough except when there were multiple transactions at the same milli second. But now we have Change Data Capture (CDC) with SQL Server 2008 and it

Using Solr to Query HBase

阅读更多关于 Using Solr to Query HBase

问题 I have a data warehousing problem, needing to query over a large dataset. For the sake of this example lets say a typical state would have 30 million users with activity stats for each. Ideally I could buy a data warehousing tool (Vertica, Infobright, etc...) but that's not in the cards or the budget. Right now I'm considering using Solr to query HBase. While I believe HBase could scale up to the needs, I worry about Solr. It's optimized as a search engine, i.e. the first pages of results

Should we separate the ssis packages between several projects in our Solution?

阅读更多关于 Should we separate the ssis packages between several projects in our Solution?

问题 I use SSIS2012 . I have created three schema in my Data warehouse( STG , TRSF , DW ). The STG schema is for staging tables. All my source file are the CSV files. I am transferring the data from my source to each table in stg schema. I have a separate package for each tables (For example: If i have 20 csv files, I will have 20 packages and i will populate 20 tables in stg schema) After that, I am transferring stg schema to trsf schema. During those process i have my business. I do lookup for

SQL Datawarehousing, need help populating my DIMENSION using TSQL SELECT or a better alternative?

阅读更多关于 SQL Datawarehousing, need help populating my DIMENSION using TSQL SELECT or a better alternative?

问题 I have a table in my SQL Server where I "stage" my datawarehouse extract from our ERP system. From this staging table ( table name: DBO.DWUSD_LIVE ) , I build my dimensions and load my fact data. An example DIMENSION table is called "SHIPTO", this dimensions has the following columns: "shipto_id "shipto" "salpha" "ssalpha" "shipto address" "shipto name" "shipto city" Right now I have an SSIS package that does a SELECT DISTINCT across the above columns to retrieve the "unique" data, then

Extract SQL Azure Federated Database to Data Warehouse with SSIS

阅读更多关于 Extract SQL Azure Federated Database to Data Warehouse with SSIS

问题 I am trying to transfer our production data to a data warehouse for reporting purposes. I've tried following the "Importing to Federations" section from the SSIS for Azure and Hybrid Data Movement, but I need to move data from my federations to the data warehouse. I've also found a good resource at SQL Server Central, but I still can't seem to bring up the federated tables in the data flow wizards. Nor can I add a Use FedDB statement in a SQL command in the ODBC (connection type needed for a

Handling a Many-to-Many Dimension when all dimensional values have 100% importance

阅读更多关于 Handling a Many-to-Many Dimension when all dimensional values have 100% importance

问题 I'll at least try to keep this succinct. Let's suppose we're tracking the balances of accounts over time. So our fact table will have columns such as... Account Balance Fact Table (FK)AccountID (FK)DateID ... Balance ... Obviously you have an Account Dimension Table and a Date Dimension Table . So now we can easily filter on Accounts or Dates (or date ranges, etc.). But here's the kicker... Accounts can belong to Groups -- any number of Groups at a given Date. Groups are simply logical

Data Warehouse - Slowly Changing Dimensions with Many to Many Relationships

阅读更多关于 Data Warehouse - Slowly Changing Dimensions with Many to Many Relationships

问题 As an example, let's say I have a fact table with two dimensions and one measure FactMoney table ProjectKey int PersonKey int CashAmount money The two dimensions are defined like this: DimProject (a type 0 dimension - i.e. static) ProjectKey int ProjectName varchar(50) DimPerson (a type 2 slowly changing dimension) PersonKey int PersonNaturalKey int PersonName varchar(50) EffectiveStartDate datetime EffectiveEndDate datetime IsCurrent bit Pretty straightforward so far. Now I'll introduce a

How to get back aggregate values across 2 dimensions using Python Cubes?

阅读更多关于 How to get back aggregate values across 2 dimensions using Python Cubes?

问题 Situation Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5. These are my datatables in pictorial form: The same in text format: Store table ------------------------------ | id | code | address | |-----|------|---------------| | 1 | S1 | Kings Row | | 2 | S2 | Queens Street | | 3 | S3 | Jacks Place | | 4 | S4 | Diamonds Alley| | 5 | S5 | Hearts Road | ------------------------------ Product table ------------------------------ | id | code | name | |-----|------|---------------| | 1 | P1