Data warehouse for user data - design Q

喜欢而已 提交于 2019-11-29 15:38:32

问题


How to best store user data vs date/time dimension? Usecase is I am trying to store user actions per day, per hour. Such as number of Shares, likes, friends etc. I have a time table and a date table. For time it is easy - i have each row = user_id and colunms = 1 to 24 for each hour of the day. But problem is for dates. If i give each day = 1 colunm then i will have 365 colunms a year. I cannot archive the data way either because analytic needs past data too. What are the other strategies?


回答1:


dimDate : 1 row per date
dimTime : 1 row per minute

At the beginning you have to state the "grain" of the fact table and then stick to it.

If the grain is one day, then TimeKey always points to the key of "23:59".

If the grain is one hour, then TimeKey points to entries of "HH:59".

If the grain is one minute, then TimeKey points to the respective "HH:MM"

If the grain is 15 minutes, then TimeKey points to the respective "HH:14", "HH:29", "HH:44", "HH:59"

And so on...

-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?

select
      DayOfWeek
    , sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser    as u on u.UserKey = f.UserKey
join dbo.dimDate    as d on d.DateKey = f.DateKey
join dbo.dimTime    as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
  and MonthNumberInYear between 1 and 3
  and t.Hour between 3 and 5
  and d.IsHoliday = 'no'
  and UserEmail = 'john_doe@gmail.com' 
group by DayOfWeek
order by DayOfWeek ;



回答2:


You would store the date in the dimension, and then add computed fields such as day_of_year.

On the designs I've worked on, we've never had more granular time slices than day, but I can't see why one couldn't have a time dimension based on date-hour, as the grain?

user_activity_facts(
   time_key references time_dimension(time_key)
  ,user_key references user_dimension(user_key)
  ,measure1
  ,measure2
  ,measure3
  ,primary key(time_key, user_key)
)
partition by range(time_key)(
   ...
)


来源:https://stackoverflow.com/questions/4948354/data-warehouse-for-user-data-design-q

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!