Slowly Changing Dimension – SCD in Data Warehousing


Slowly Changing Dimension
Dimension that changed overtime called slowly Changing Dimensions. For Example, People may change their address, sometimes they may change their names, one product price changes over the time these are few examples of slowly changing Dimensions.
This is a common issue facing data warehousing practitioners. This section explains the problem, and describes the three ways of handling this problem with examples. Type 1, Type 2, Type 3
Type 1: The new record replaces the original record. No trace of the old record exists. The new information simply overwrites the original information. In other words, no history is kept.
Example: Mr.XX is customer of Sam’s Club. Recently he has changed his address from NY to NJ. And now he is buying grocery from Sam’s Club. So now in Sam’s club membership card the address of Mr. XX has been changed. So the information in Sam’s club database has been overwrites. There is no record of old address. So In Sam’s club database Mr.XX is the customer from NJ. They do not have any records of Mr.XX while their stay was in NY.
Advantage:
This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages:
All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the business would not be able to know that Mr.XX lived in NY before.
Best time to Use Type 1
Slowly changing dimension Type 1 should be used when it is not necessary for the data warehouse to keep track of historical changes.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
In this case even though Mr. XX has changed his address from NY to NJ. There is one line added in Mr.XX’s details under his Membership Number. In short Under 2 different address there is just one person.
Advantages:
This allows us to accurately keep all historical information.
Disadvantages:
This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
Best Time to Use Type 2
Slowly changing dimension type 2 should be used when it is necessary for the data warehouse to track historical changes.
Type 3The original record is modified to reflect the change.
In this case, Mr. XX’s previous address has been modified. And from the date Mr. XX has changed his address it reflects in the database of Sam’s Club from the same date.
Advantages:
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages:
In this case If Mr.XX will again change his address then the most previous information will not be in the database. Type 3 will not be able to keep all history where an attribute is changed more than once
Best time to Use Type 3
Slowly changing dimension type 3 should only be used when it is necessary for the data ware house to track historical changes, and when such changes will only occur for a finite number of time.