Thursday, August 16, 2012

Introduction to Dimensions

What is a Dimension Table?

A dimension table provides the description behind the analytic numbers.  It describes the who, what, when, where and why behind the facts. Dimensions are normally broken down into groups (tables) and they contain several attributes (columns).   Unlike a fact table the dimension table is not normalized.  Generally, dimension tables have many columns but a limited amount of rows. 
Dimension tables normally provide two purposes in a data warehouse, it can be used to filter queries and to select data.


Several data warehouses include the following dimension tables products, employees, customers, time, and location.  Lets say your business requirement is to provide an  time tracking data warehouse.  You would to want to implement a employee dimension table that included at least the following attributes (columns) first name, last name, status, start date, hire date, end date, department name title, salary etc….   we could actually go on and on and this is okay as long as the attributes support your business requirements.

image image


What are dimension hierarchies?

Dimension hierarchies provide a way to define a relationship between multiple attributes within a dimension. They are commonly defined as a structure to provide drill up and drill down capabilities.  You can have multiple hierarchies within a dimension. For example, in a time dimension you might want to have two separate hierarchies, one for fiscal year and another for calendar year.  These two dimension could contain the following attributes year, quarter, month, day.
image

Best Practices for designing a Dimension table

  • Use a unique identifier integer column that is auto incremental as your primary key.  This is commonly known as a surrogate key.
  • Use the source’s primary key as an alternating key.
  • Any additional attributes (columns) that describe the business entity

Slowly Changing Dimensions

Now that we know how to build a dimension we need to consider how the data is stored.  Yes, occasionally the attribute data changes.  For example, lets pretend you are a die hard Pirates fan and built a Data Warehouse to do analysis on your favorite players.  How would you handle promotions like Andrew McCutchen getting called up from AAA Indianapolis to the Pirates?
The following are options to handle slowly changing dimensions:
  1. Do nothing (type 0).  This approach is highly not recommended.  This cannot guarantee history preservation and includes the least control over managing changed attributes.
  2. Overwrite old data with new data (type 1)
    This completely overwrites the previous attribute value.  This could be ideal if you had a spelling error or the historical value of this column is not crucial. This is the easiest method to implement but it can be hard to manage.
  3. Create multiple records with different key values (type 2)
    This provides you with the ability to have an unlimited amount of history.  It is usually implemented by adding a start and end date column.  If you have to make changes to past records this could require you to also update fact records.
  4. Create a new column for previous values (type 3)
    This allows only a fixed amount of history to be retained.  If you create two extra columns for the team attribute you can only store three team values (the current value and last two previous values)
Source: http://johnsterrett.com/2009/11/01/introduction-to-dimensions/

No comments:

Post a Comment