ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION 100: Dimension

Showing posts with label Dimension. Show all posts

Degenerate Dimension in OBIEE

A degenerate dimension (DD) acts as a dimension key in the fact table, however does not join to a corresponding dimension table because all its interesting attributes have already been placed in other analytic dimensions.

Sometimes people want to refer to degenerate dimensions as textual acts, however they're not facts since the fact table's primary key often consists of the DD combined with one or more additional dimension foreign keys.
Degenerate dimensions commonly occur when the fact table's grain is a single transaction.

How to Implement in OBIEE :

Refer: http://obieetalk.com/oracle-bi-ee-101341-%E2%80%93-modeling-degenerate-dimensions-%E2%80%93-fact-attributes

Reference:http://www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT46AnotherLook.pdf

Oracle BI EE 10.1.3.4.1 – Modeling Degenerate dimensions – Fact attributes

Degenerate dimensions are quite common when we do a Data Warehouse implementation. They are nothing but a set of dimensions that can be classified as Fact attributes and are hence maintained in the fact table itself instead of separate dimension tables. One of the key advantages of using Degenerate dimensions is the fact that there is no need for doing a separate Surrogate lookup while creating the ETL mappings. Generally degenerate dimensions do not alter the grain of a fact table and hence are maintained within the Fact table itself. In some cases, they act as a reference between the data warehouse and the transactional system.

For example, Order Numbers, Invoice Number, Transaction Dates, Sale Dates, Credit-Debit Indicator etc can all be classified as degenerate dimensions. The interesting aspect of degenerate dimensions is, they can get a little bit tricky while modeling in BI EE. Let me take this through an example. Lets assume that we have 2 fact tables SALES and COST (the SH schema in Oracle). Sales fact table has a degenerate dimension called SALE_ID(which acts as a reference to a SALE made in the transactional system). How do we model this? I will cover the method that is normally used and describe the pros & cons of this approach. After that i will cover 2 separate methods that can prove useful while modeling degenerate dimensions.

The traditional method of modeling degenerate dimensions is to include them in the Logical Fact table itself as shown below without any aggregation.

From my experience modeling degenerate dimensions this way can be useful only if you have the following requirements

1. Have just a single fact table and all your reports go against only those measures in that single fact table.
2. Requirement to create logical calculations (not db calculations) out of these degenerate dimensions. For example, FILTER calculations on a measure require the logical column to be present.

But even when you have the requirements listed above, care must be taken while exposing this attribute to the presentation layer. One of the major drawbacks of the above technique is the fact that when we model it this way, BI Server cannot enforce conforming dimension joins.

Non-Conforming Degenerate Dimensions:

For example, lets create a report as shown below.

As you see, when we bring in the degenerate dimension in to the report, the UNIT_COST measure which is from the other fact table becomes NULL. One of the main reasons why this happens is, SALE_ID does not have an aggregation associated with it. For conforming dimension based joins to happen, we need dimensional attributes that can be pushed inside the GROUP BY clause of a SQL query. When we do not have an aggregate specified for a column belonging to a fact table, BI Server does not know where to push this column in the SQL query. Hence it basically nulls out the UNIT_COST measure thereby negating the conforming dimension based join. In this case, degenerate dimension SALE_ID does not conform to the COST fact. How do we resolve this?

To resolve this, there are some fundamental questions that needs to be answered. The basic question will be, when we expose degenerate dimensions to end users, we need to know what needs to happen to the measures from other fact tables. In the above example, UNIT_COST has no relation whatsoever with the degenerate dimension. In hindsight, the null in the above report seems correct as it does not have relation to the SALE_ID. But since we have modeled it as a fact attribute, we would expect the UNIT_COST measure to be calculated for each channel and then repeat itself (like a Level Based measure) for every SALE_ID as shown below.

This is done by modeling the degenerate dimension as a separate dimension in BI EE as shown below. Remember, from a BI EE perspective one physical table can act as dimension or fact or both depending on how you model it. The model is shown below

The new dimension Sale that we modeled is still not related to the COST measures. To enable that we need to assign the COST measures to the total level of the Sale dimension.

This will enable reporting on degenerate dimensions even on non-conforming facts. This technique will even more be useful when you have the same degenerate dimension in 2 different fact tables. Lets look at that next.

Conforming Degenerate Dimensions:

Lets assume that the fact tables FACT_COST and FACT_SALES have a degenerate dimension called BOUGHT_BY & PAID_BY respectively. Both these attributes can have only 2 values (Credit-Card or Cash). If we model them as degenerate dimensions using the traditional method, the biggest issue we will face will be the inability to use them for reporting across the fact tables. If we model it as a dimension, we can effectively use a single column for 2 differing perspectives depending on how we look at the data.

We start with basically creating a new dimension in the BMM called Payment as shown below.

The payment logical column will be mapped to the BOUGHT_BY degenerate dimension column of the COSTS table and PAID_BY degenerate dimension column of the SALES table.

In the reporting layer, one can use this single attribute to report on 2 different fact tables even though the attribute maps to 2 different columns in the physical source.

Treating degenerate dimensions as normal dimensions in BI EE can prove more beneficial than using it as a fact attribute. Especially when you have users using BI EE as an ad-hoc analysis tool(when you do not want your model to generate wrong numbers/errors).

Oracle BI EE 10.1.3.4.1 – OLTP Reporting – Switching Tables based on chosen dimensions – Use Case 1

Though BI EE is well known for reporting on Stars, Facts and Dimensions, one of the lesser known or lesser advertised features is its extensive transactional or OLTP modeling capabilities. Steve Devine (our guest author last week) documents his experience well here. Time and again, i find that BI Server’s transactional reporting capabilities is not marketed well. For example, i was going through a classic transactional reporting requirement on Orders and somehow i thought it would make sense to blog about it as it quite clearly demonstrates the advantages of the BI Server in doing transactional reporting. To make it simple for demonstration purposes, i would use the Order Entry schema for illustration.

Consider the data model diagram of OE schema shown below

As you see its a very simple Orders and Order Items transactional schema. Whenever an order is made, an entry is made into the ORDERS table and then the order items go into the ORDER_ITEMS table. Lets consider 2 different use cases.

1. ORDERS table contains the Order ID & Order Quantity. ORDER_ITEMS table contains Order Quantity per product, the unit price per ordered product and the the Order ID

2. ORDERS table contains the Order Quantity and ORDER_ITEMS table contains just the unit price of individual products sold.

Use Case 1: So, in the first case we have Order Quantity coming from 2 different tables. One is from the ORDERS level table which gives the value at an Order ID level. The second is from the ORDER_ITEMS table which gives the value at the Product level. So whenever an end user creates a report containing any one of the Customer, Promotion dimensions and the Order Quantity measure the ORDERS table quantity measure would have to be used. But as soon as Product dimension is chosen the quantity measure should automatically switch to the ORDER_ITEMS table.

To model this, we would have to first identify the dimensions. In our case, it would be Product, Customer and Promotion. Now create a Fact logical table. This fact table would contain 2 measures. One is Order Quantity and Unit Price. Set the aggregation of both these measures to sum. Remember that Order Quantity measure needs to switch the tables depending on the dimension used in analysis. In effect we have measures at 2 grains here.

1. Orders grain – A logical fact table source would have to be created to cater to this grain. This table source will contain just the ORDERS table. It will contain mapping only to the Order Quantity measure.

2. Order Items grain – Another logical table source will be created to cater to reporting at this grain. This will contain ORDER_ITEMS table inner joined with ORDERS table. The mapping to both the measures would be from the ORDER_ITEMS table.

The ordering of the fact logical table sources is important. This will always ensure that whenever just customer, Promotion and Order quantity is chosen, it will go against the ORDERS table alone. But as soon as the Products dimension is chosen the Order Quantity will automatically switch to the ORDER_ITEMS table.

So for the reports below the queries would be as shown

select T6753.GENDER as c1,
     sum(T6775.ORDER_TOTAL) as c2
from
     CUSTOMERS T6753,
     ORDERS T6775
where  ( T6753.CUSTOMER_ID = T6775.CUSTOMER_ID )
group by T6753.GENDER
order by c1

select T6753.GENDER as c1,
     T6798.PRODUCT_STATUS as c2,
     sum(T6785.QUANTITY) as c3
from
     PRODUCT_INFORMATION T6798,
     CUSTOMERS T6753,
     ORDERS T6775,
     ORDER_ITEMS T6785
where  ( T6753.CUSTOMER_ID = T6775.CUSTOMER_ID
and T6775.ORDER_ID = T6785.ORDER_ID
and T6785.PRODUCT_ID = T6798.PRODUCT_ID )
group by T6753.GENDER, T6798.PRODUCT_STATUS
order by c1, c2

This is a very commonly used method wherein we are forcing a fact table selection based on the dimension that is getting chosen. I will follow this up with a solution for use case 2.

As a side note, if you have any such modeling questions especially when you are dealing with OLTP tables, feel free to leave them as comments here. I will try to cover them as blog posts in future.

Oracle BI EE 10.1.3.4.1 – OLTP Reporting – Conforming Dimensions and Multi-Grain Fact measures

In the last blog entry here, i had shown a very simple transactional reporting use case wherein i made the BI Server to switch tables based on the measures chosen. Basically switching of tables would result in hitting a specific table to get the desired measure. But in that case we were reporting on measures of the same grain. Today we shall see the use case 2 wherein we have 2 measures at different grains. This is again a classical OLTP reporting use case as in most cases we would typically encounter measures at different grains.

For example, lets take the same Order Entry schema. We have 2 tables ORDERS and ORDER_ITEMS. ORDERS has the Order Quantity measure sold against every order. ORDER_ITEMS has the Order Sale measure for every product sold as part of an order.

ORDER_TOTAL (or Order Quantity) measure is at a higher grain than the UNIT_PRICE (or Sale) measure. A reporting end user would typically be interested in reports containing both the measures like the 2 reports shown below.

As you see, the first report is at the Orders grain requesting ORDER_TOTAL and UNIT_PRICE measure. The second is at ORDER_ITEMS or Products grain requesting both ORDER_TOTAL and UNIT PRICE measure. Both the reports are very valid but require completely different queries.

For example, for the first report we cannot join on ORDER_ID between ORDERS and ORDER_ITEMS as that will bump up the ORDER_TOTAL value completely. But for the second one we can join on the ORDER_ID as we can afford to have the same ORDER_TOTAL value to repeat itself for every product that was bought as part of the order (some people prefer to leave these as nulls which again can be easily done). In normal cases, this would be tricky to achieve as we need to generate completely different queries depending on the grain of the report. But BI Server makes it easier with conforming dimensions and level based measures.

The idea is to model different granular measures as different fact logical tables in the Business Model Layer. In our case, we need 2 different facts, one for ORDER_TOTAL and the other for UNIT_PRICE as shown below

Ensure that the lower granular table is inner joined with the higher granular table for the logical fact table having lower granular measures (ORDER_ITEMS would be inner joined with ORDERS in the logical table source of Facts Sale Grain logical table)

Remember we need to make the BI Server to produce 2 different queries when we are at the Orders grain. In order to do that, make the ORDER_TOTAL measure to be a Total Level Based measure on the Products dimension.

Now, if you generate the first report, you would notice that BI Servers fires 2 different queries as shown below

select D1.c2 as c1,
     D1.c3 as c2,
     D1.c1 as c3
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3
          from
               (select sum(T6775.ORDER_TOTAL) as c1,
                         T6775.ORDER_ID as c2,
                         T6775.ORDER_DATE as c3,
                         ROW_NUMBER() OVER (PARTITION BY T6775.ORDER_ID ORDER BY T6775.ORDER_ID ASC) as c4
                    from
                         ORDERS T6775
                    group by T6775.ORDER_DATE, T6775.ORDER_ID
               ) D1
          where  ( D1.c4 = 1 )
     ) D1
order by c1

select D2.c2 as c1,
     D2.c3 as c2,
     D2.c1 as c3
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3
          from
               (select sum(T6785.UNIT_PRICE) as c1,
                         T6775.ORDER_ID as c2,
                         T6775.ORDER_DATE as c3,
                         ROW_NUMBER() OVER (PARTITION BY T6775.ORDER_ID ORDER BY T6775.ORDER_ID ASC) as c4
                    from
                         ORDERS T6775,
                         ORDER_ITEMS T6785
                    where  ( T6775.ORDER_ID = T6785.ORDER_ID )
                    group by T6775.ORDER_DATE, T6775.ORDER_ID
               ) D1
          where  ( D1.c4 = 1 )
     ) D2
order by c1

After producing both the queries above, BI Server will make an in-memory stitch join to produce the desired results (based on conforming dimensions). Same would be case for the 2nd report as well. But the queries generated would be different.

select D1.c3 as c1,
     D1.c4 as c2,
     D1.c2 as c3,
     D1.c1 as c4
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3,
               D1.c4 as c4
          from
               (select sum(T6785.UNIT_PRICE) as c1,
                         T6798.PRODUCT_NAME as c2,
                         T6775.ORDER_ID as c3,
                         T6775.ORDER_DATE as c4,
                         ROW_NUMBER() OVER (PARTITION BY T6775.ORDER_ID, T6798.PRODUCT_NAME ORDER BY T6775.ORDER_ID ASC, T6798.PRODUCT_NAME ASC) as c5
                    from
                         PRODUCT_INFORMATION T6798,
                         ORDERS T6775,
                         ORDER_ITEMS T6785
                    where  ( T6775.ORDER_ID = T6785.ORDER_ID and T6785.PRODUCT_ID = T6798.PRODUCT_ID )
                    group by T6775.ORDER_DATE, T6775.ORDER_ID, T6798.PRODUCT_NAME
               ) D1
          where  ( D1.c5 = 1 )
     ) D1
order by c1

select D2.c2 as c1,
     D2.c3 as c2,
     D2.c1 as c3
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3
          from
               (select sum(T6775.ORDER_TOTAL) as c1,
                         T6775.ORDER_ID as c2,
                         T6775.ORDER_DATE as c3,
                         ROW_NUMBER() OVER (PARTITION BY T6775.ORDER_ID ORDER BY T6775.ORDER_ID ASC) as c4
                    from
                         ORDERS T6775
                    group by T6775.ORDER_DATE, T6775.ORDER_ID
               ) D1
          where  ( D1.c4 = 1 )
     ) D2
order by c1

There are also a couple of other approaches that we can use to model such measures. One is done in Answers (which i generally do not recommend since that forces an end user to know them) and then the other is by using a count column in a logical table source. Both can be valuable in some cases. I will try to cover them as well in the coming weeks.

Oracle BI EE 10.1.3.4.1 – Sub-Totals during Drills – Conforming Dimensions

I saw a peculiar requirement being discussed in a pentaho forum(i was actually browsing for some unique Pentaho features and stumbled upon the requirement) this week wherein end users wanted to get the sub-totals automatically while drilling on a dimension. In normal cases, i would have left it as a unique requirement and would not have tried to replicate in BI EE. But when i thought about this requirement further, it somehow seemed like a very common requirement to me. Whenever i look at my phone or internet bills, i always look for sub-totals. Without them a reporting tool would be incomplete. And BI EE does offer some good sub-total features out of the box. But what is not available is an automated Sub-Totaling feature after drilling down on a dimension in a dashboard. Many might argue that when we do the drill we are actually drilling from a Total to its individual values. But again sometimes end users would like to have this feature (atleast i would love to have this in BI EE).

There are couple of options to enable this. Both of them require some good amount of work (for a small feature) in the repository. I would recommend to use either of the solutions only for a dimension or 2 with not more than 4 to 5 levels. It is possible to use this for more than 3 dimensions as well but the repository can grow in size pretty quickly. We shall be seeing a technique that leverages the Conforming dimensions concept of BI EE. Conforming dimensions can be explained easily by the diagram below

As you see, whenever 2 different dimensions form 2 logical table sources for the same dimension and we have 2 different fact tables, then the dimension and fact combination would produce a union of the dimension members. The basic premise behind conforming dimensions is the fact that there is no linkage between the dimensions nor the fact themselves. That is Dim1 should be related to Fact1 alone and Dim2 should be related to Fact2 alone. If there is any relation between Dim1 and Fact2 or vice-versa, BI Server would not treat them as conforming dimensions.

With that background lets move on to the actual requirement. Producing dynamic sub-totals. The idea is very straight-forward. We make every member that we drill on to be a member in the column that comes up after drill (in addition to the child members of the parent member that we drilled on). For example, consider the report below

As you see, while drilling on Channel total member, we get the children of that member in addition to the member itself which simulates a sub-total. If we have n levels in a dimension when we model it we would have to model it as having n+1 levels. For example, in our example the Channel Dimension has 3 levels. Channel Total –> Channel Class –> Channel Desc. So, when we model in our repository, we need to basically have 4 levels as shown below.

According to the conforming dimension concept explained above, in order to ensure that BI EE fires different queries for every fact table, we need to create 3 aliases for the Channels dimension and 3 aliases for the fact table.

Include all the 3 channel aliases as logical table sources for the channel dimension. Now, ensure that mappings of the sources follow this table.

Also create 3 separate fact tables with the same aggregation for the amount measure. Each fact table will be sourced by a Fact alias

Now, for normal reports that do not require this capability, just pull in the dimension and one of the fact alias tables. Wherever this sub-total drill is required one would have to ensure that all the fact measures from all the fact tables are chosen. This will ensure that conforming dimension join kicks in as shown below.

From here on, its just a matter of combining all the facts into a single fact using the formula below

IFNULL(SALES.AMOUNT_SOLD,IFNULL("SALES1".AMOUNT_SOLD,0)+IFNULL("SALES2".AMOUNT_SOLD,0))

The position of the sub-total can be easily controlled by adding one more sort order column in all the Logical table sources.

Degenerate Dimension in OBIEE

Oracle BI EE 10.1.3.4.1 – Modeling Degenerate dimensions – Fact attributes

Oracle BI EE 10.1.3.4.1 – OLTP Reporting – Switching Tables based on chosen dimensions – Use Case 1

Oracle BI EE 10.1.3.4.1 – OLTP Reporting – Conforming Dimensions and Multi-Grain Fact measures

Oracle BI EE 10.1.3.4.1 – Sub-Totals during Drills – Conforming Dimensions

Subscribe To