data-warehouse

How to pivot row data using Informatica?

痞子三分冷 提交于 2019-12-11 11:03:50
问题 How can I pivot row data using Informatica PowerCenter Designer? Say, I have a source file called address.txt: +---------+--------------+-----------------+ | ADDR_ID | NAME | ADDRESS | +---------+--------------+-----------------+ | 1 | John Smith | JohnsAddress1 | | 1 | John Smith | JohnsAddress2 | | 2 | Adrian Smith | AdriansAddress1 | | 2 | Adrian Smith | AdriansAddress2 | +---------+--------------+-----------------+ I would like to Pivot this data like this: +---------+--------------+-----

filter length of time [duplicate]

风流意气都作罢 提交于 2019-12-11 09:09:50
问题 This question already has answers here : Calculate time difference (only working hours) in minutes between two dates (5 answers) Closed 5 years ago . I need to calculate the time that is between 8AM and 10PM. Other time I dont need. and now I do this by excel, and I want to do automate the process I have the following table, events start_date | end_date | duration_REAL | duration_08AM_a_10PM ------------------------------------------------------------------------------- 08:00AM 20-05-2014 |

In SQL Server CDC with SSIS, which data should be stored for windowing (LSN or Date)?

时光毁灭记忆、已成空白 提交于 2019-12-11 07:49:37
问题 I have implemented delta detection while loading data warehouse from transaction systems using an identity column or date-time column in source transaction tables. When data needs to be extracted next time, the maximum date-time value extracted last time is used in the filter of extraction query to identify new or changed records. This was good enough except when there were multiple transactions at the same milli second. But now we have Change Data Capture (CDC) with SQL Server 2008 and it

Using Solr to Query HBase

╄→尐↘猪︶ㄣ 提交于 2019-12-11 04:38:52
问题 I have a data warehousing problem, needing to query over a large dataset. For the sake of this example lets say a typical state would have 30 million users with activity stats for each. Ideally I could buy a data warehousing tool (Vertica, Infobright, etc...) but that's not in the cards or the budget. Right now I'm considering using Solr to query HBase. While I believe HBase could scale up to the needs, I worry about Solr. It's optimized as a search engine, i.e. the first pages of results

Should we separate the ssis packages between several projects in our Solution?

我与影子孤独终老i 提交于 2019-12-11 04:24:39
问题 I use SSIS2012 . I have created three schema in my Data warehouse( STG , TRSF , DW ). The STG schema is for staging tables. All my source file are the CSV files. I am transferring the data from my source to each table in stg schema. I have a separate package for each tables (For example: If i have 20 csv files, I will have 20 packages and i will populate 20 tables in stg schema) After that, I am transferring stg schema to trsf schema. During those process i have my business. I do lookup for

SQL Datawarehousing, need help populating my DIMENSION using TSQL SELECT or a better alternative?

半世苍凉 提交于 2019-12-11 03:54:19
问题 I have a table in my SQL Server where I "stage" my datawarehouse extract from our ERP system. From this staging table ( table name: DBO.DWUSD_LIVE ) , I build my dimensions and load my fact data. An example DIMENSION table is called "SHIPTO", this dimensions has the following columns: "shipto_id "shipto" "salpha" "ssalpha" "shipto address" "shipto name" "shipto city" Right now I have an SSIS package that does a SELECT DISTINCT across the above columns to retrieve the "unique" data, then

Extract SQL Azure Federated Database to Data Warehouse with SSIS

荒凉一梦 提交于 2019-12-11 02:26:34
问题 I am trying to transfer our production data to a data warehouse for reporting purposes. I've tried following the "Importing to Federations" section from the SSIS for Azure and Hybrid Data Movement, but I need to move data from my federations to the data warehouse. I've also found a good resource at SQL Server Central, but I still can't seem to bring up the federated tables in the data flow wizards. Nor can I add a Use FedDB statement in a SQL command in the ODBC (connection type needed for a

Handling a Many-to-Many Dimension when all dimensional values have 100% importance

北战南征 提交于 2019-12-11 01:32:22
问题 I'll at least try to keep this succinct. Let's suppose we're tracking the balances of accounts over time. So our fact table will have columns such as... Account Balance Fact Table (FK)AccountID (FK)DateID ... Balance ... Obviously you have an Account Dimension Table and a Date Dimension Table . So now we can easily filter on Accounts or Dates (or date ranges, etc.). But here's the kicker... Accounts can belong to Groups -- any number of Groups at a given Date. Groups are simply logical

Data Warehouse - Slowly Changing Dimensions with Many to Many Relationships

[亡魂溺海] 提交于 2019-12-10 22:32:34
问题 As an example, let's say I have a fact table with two dimensions and one measure FactMoney table ProjectKey int PersonKey int CashAmount money The two dimensions are defined like this: DimProject (a type 0 dimension - i.e. static) ProjectKey int ProjectName varchar(50) DimPerson (a type 2 slowly changing dimension) PersonKey int PersonNaturalKey int PersonName varchar(50) EffectiveStartDate datetime EffectiveEndDate datetime IsCurrent bit Pretty straightforward so far. Now I'll introduce a

How to get back aggregate values across 2 dimensions using Python Cubes?

偶尔善良 提交于 2019-12-10 21:19:08
问题 Situation Using Python 3, Django 1.9, Cubes 1.1, and Postgres 9.5. These are my datatables in pictorial form: The same in text format: Store table ------------------------------ | id | code | address | |-----|------|---------------| | 1 | S1 | Kings Row | | 2 | S2 | Queens Street | | 3 | S3 | Jacks Place | | 4 | S4 | Diamonds Alley| | 5 | S5 | Hearts Road | ------------------------------ Product table ------------------------------ | id | code | name | |-----|------|---------------| | 1 | P1