Data Deduplication in OBIEE

Consider the following business scenario that is usually encountered in compensation management.
A company needs to set up an incentive management system for their sales reps and their associated managers based on the amount of sales they bring in from the zip codes they are assigned.
However, the grain of the sales data in the fact table SALES_F is at the zip code level and not at the employee level.
Image-0000
Now let us look at  how the  sales reps and their managers are assigned to different zip codes in the EMPLOYEE_ALIGN table.

Image-0001

You see that a particular zipcode can have many sales reps who may be assigned to the same or different managers.
In short, there is no way of determining how much sale  a particular employee has brought in at a particular zip code as the company simply does not have the data at that granularity. In such a situation, businesses adopt different  rules  to give incentives to the employees. Usually they give credit for the entire sales in a particular zipcode to all the sales reps assigned to that zipcode.
For eg:
Let’s say two sales reps EMP1 and EMP2 are assigned to zipcode 9000, and their manager is MGR1. If the total sales in zipcode 9000 was $2000, then according to the business rule, the incentive for EMP1 will be calculated based on sale amount  of $2000 and for EMP2 on sale amount $2000, and MGR1’s incentive will be based on the sale amount $2000.
Joining the SALES_F table and the EMPLOYEE_ALGIN table on the zipcode column therefore will not provide us with the desired results as there is a many to many relation between the EMPLOYEE_ALIGN and the SALES_F table.
Let’ see how we can implement the incentive management according to the business rules.
The key to implementing this is to avoid all many to many joins between your fact and dimensions. If we use the zipcode as the key to join the fact and dimension, there is no way we can avoid many to many relationship.
Basically, to implement the incentive management the total Sale_amount for each level in the employee hierarchy has to come from an aggregate table at that level.
Let us see how we go about creating the aggregate table for each level:
Suppose there are 4 levels in the employee hierarchy:
RegionalMgr—>AreaMgr—>SalesMgr—->SalesRep
we have to make 4 aggregate tables, one at each level of the hierarchy.
Now to make the aggregate table for the Sales_Mgr level, join the employee_align table to  the SALES_F table on the zipcode.
The SALES_MGR fact table will look like this:
Image-0004
Similarly create aggregate tables for all other levels in the employee hierarchy.
Also create a new Employee_hier table as shown below which gives the hierarchy relationship between various levels in the employee hierarchy.
Image-0006
Now that all the database aggregate tables are ready, import these into the physical layer of the RPD along with the employee hierarchy table, and join all the fact tables to the Employee_hierarchy table using LOGIN column as the as the join key. This way we will eliminate all many to many joins between the FACT and dimension.
Image-0007
Drag the Aggregate table at the lowest level, i.e the AGG_EMPLOYEE_F table and the dimension table EMPLOYEE_HIER table into the business layer of the RPD.  Now add the other aggregate tables as sources for the fact table and set the corresponding logical level.
Image-0008
Also create the Employee Dimension and define the logical levels keys.

Image-0009

Now drag the tables into the presentation layer.
Now let us create the following request in answers:
Image-0015
When you look at the sale amount of the regional manager you can see that there is no duplicate data and the numbers are in agreement with the total sale across all the zip codes.
When we look at the data for individual sales reps  we can see that each rep is assigned the sale amount as per the business rule (entire sale amount for a particular zipcode to all sales reps assigned to that zipcode). Also, the grand total is equal to the total sale amount  across all zip codes. (it’s not the sum of sale amount assigned to each sales rep).
Image-0014
Lets look at the physical query in the query log. We’ll see that there are  two queries being generated, one accessing the aggregate table for the sales rep to show the sale amount of the sales reps , and the second query accessing the Aggregate table for regional manager to  display the Grand Total for Regional manager.
Image-0013