Resolving Fan Traps and Circular Joins using OBIEE

If you’re looking to migrate data from a Discoverer system to Oracle Business Intelligence Enterprise Edition, one of the first things you’ll probably end up thinking about is how your (not necessarily star schema-based) End User Layer will translated to a (star schema mandatory) Common Enterprise Information Model. In particular, Discoverer allows tables to have more than one join route between them, and it also allows circular joins so that, for example, an employee can belong to a department, which belongs to a region, which has an employee as the regional manager.
Circularjoin1-2
If you try and bring this arrangement in to the Physical Layer of your OBIEE semantic model, the administration tool will initially allow this and your repository verifies as being correct.
Circularjoins3
However if you then drag and drop the tables into the logical business model layer, all three lookup tables are shown as dimension tables, as you would expect,
Circularjoins4
and the BI Administrator tool shows an error message when you try and verify the repository, saying that circular joins aren’t supported.
Circularjoins5
In reality, you don’t want three dimension tables (departments, regions, employees) joining to the fact table (appointments), you just want one, so you’re going to have to use the mapping ability in the business model and mapping layer to join these three lookup tables together under one logical table source. For the employee table though, rather than join it through this circular join, you need to create an alias for it in the physical layer (in my case I call it CIRC_JOIN_REGEMP) like this:
Circularjoin6
and join the region table to this rather than the employees table.
Circularjoin7
Then you can create a single logical dimension table within your logical business model, and map the logical table source to the join between the four lookup tables, including your new alias for the employee table.
Circularjoins8
When you add the new columns into the logical table and then run a report, the figures come up as expected, with the regional manager details displayed correctly, the alias having resolved the circular join.
Circjoin11
Another issue that often comes up with Discoverer is “Fan Traps”. Fan traps are an oddity of SQL where, if you have say a dimension table full of account details, and two facts that join to it, say with sales figures in one and budget figures in the other, logically if you query the two together you should get a set of results like this, taken from the Oracle document “Oracle Discoverer’s Fan Trap Resolution – Correct Results Every Time”
Fantrap1
Fantrapexpectedres
The problem is though, that if you issue the obvious query in SQL, the results come back wrong, because SQL does a cartesian join between the two fact tables and over-counts the figures.
SQL> select a.acc_name,
2  sum(sales),
3  sum(budget)
4  from fan_trap_accounts a, fan_trap_sales s, fan_trap_budget b
5  where a.acc_id = s.acc_id
6  and   a.acc_id = b.acc_id
7  group by a.acc_name
8  order by a.acc_name
9  /

ACC_NAME             SUM(SALES) SUM(BUDGET)
-------------------- ---------- -----------
ACCOUNT1                    900        1050
ACCOUNT2                    130         200
ACCOUNT3                    600         750
ACCOUNT4                    600         600
Now Discoverer handles this quite well, and automatically rewrites the query internally to use two in-line views, one to join the sales and accounts tables, and one to join the budget and accounts tables, and then join the results together with the correct aggregation applied. So how does OBIEE handle this? Does it ignore the potential error and give you the wrong figures, or does it even disallow this sort of schema design in the first place? Let’s take a look.
The BI Administrator tool doesn’t have a problem with this sort of arrangement in the physical layer:
Fantrap2
The logical business model is fine as well, with the sales and budget tables being displayed as fact tables, and the accounts table as a dimension, as you would expect.
Fantrap3
In fact, when you run a query, using the same selection as our SQL statement, the figures do in fact come out correct.
Fantrap4-1
So what’s happening? Well, if you take a look at the query log and see how the logical, and physical, SQL queries are put together, you can see that OBIEE is doing what Discoverer does, but within the BI Server rather than writing a single SQL statement that uses inline views. The BI Server effectively creates two inline views that in turn lead to two physical SQL statements being sent to Oracle, the BI Server then joins them together and returns the data back as a single result set.
-------------------- Execution plan:

RqBreakFilter <<2750>>[1] [for database 0:0,0]
    RqList <<2641>> [for database 0:0,0]
        case  when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end  as c1 GB [for database 3023:5710,44],
        D903.c3 as c2 GB [for database 3023:5710,44],
        D903.c4 as c3 GB [for database 3023:5710,44]
    Child Nodes (RqJoinSpec): <<2752>> [for database 0:0,0]
        (
            RqList <<2795>> [for database 0:0,0]
                D902.c1 as c1 [for database 3023:5710,44],
                D901.c1 as c2 [for database 3023:5710,44],
                D901.c2 as c3 GB [for database 3023:5710,44],
                D902.c2 as c4 GB [for database 3023:5710,44]
            Child Nodes (RqJoinSpec): <<2798>> [for database 0:0,0]

                    (
                        RqList <<2653>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
                            sum(FAN_TRAP_SALES.SALES by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
                        Child Nodes (RqJoinSpec): <<2686>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS T6678
                            FAN_TRAP_SALES T6685
                        DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_SALES.ACC_ID [for database 0:0]
                        GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME]  [for database 3023:5710,44]
                        OrderBy: c1 asc [for database 3023:5710,44]
                    ) as D901 FullOuterStitchJoin <<2744>> On D901.c1 =NullsEqual D902.c1; actual join vectors:  [ 0 ] =  [ 0 ]

                    (
                        RqList <<2690>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS.ACC_NAME as c1 GB [for database 3023:5710,44],
                            sum(FAN_TRAP_BUDGET.BUDGET by [ FAN_TRAP_ACCOUNTS.ACC_NAME] ) as c2 GB [for database 3023:5710,44]
                        Child Nodes (RqJoinSpec): <<2723>> [for database 3023:5710:ora11g,44]
                            FAN_TRAP_ACCOUNTS T6678
                            FAN_TRAP_BUDGET T6681
                        DetailFilter: FAN_TRAP_ACCOUNTS.ACC_ID = FAN_TRAP_BUDGET.ACC_ID [for database 0:0]
                        GroupBy: [ FAN_TRAP_ACCOUNTS.ACC_NAME]  [for database 3023:5710,44]
                        OrderBy: c1 asc [for database 3023:5710,44]
                    ) as D902
        ) as D903
    OrderBy: c1 asc [for database 0:0,0]

+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48

-------------------- Sending query to database named ora11g (id: <<2653>>):

select T6678.ACC_NAME as c1,
     sum(T6685.SALES) as c2
from
     CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
     CUST_ORDER_HISTORY.FAN_TRAP_SALES T6685
where  ( T6678.ACC_ID = T6685.ACC_ID )
group by T6678.ACC_NAME
order by c1

+++Administrator:2a0000:2a0001:----2008/08/26 17:24:48

-------------------- Sending query to database named ora11g (id: <<2690>>):

select T6678.ACC_NAME as c1,
     sum(T6681.BUDGET) as c2
from
     CUST_ORDER_HISTORY.FAN_TRAP_ACCOUNTS T6678,
     CUST_ORDER_HISTORY.FAN_TRAP_BUDGET T6681
where  ( T6678.ACC_ID = T6681.ACC_ID )
group by T6678.ACC_NAME
order by c1
Now this strictly speaking isn’t as efficient as Discoverer’s approach, which uses a single SQL statement and does the join of the two inline views within the database, but it’s still pretty clever and would work for platforms that don’t support inline views. In a way its an “on the fly” creation of two logical table sources without you having to set this up explicitly in the logical business model.
One thing I haven’t tried out yet is how OBIEE handles queries that, in Discoverer, would cause a fan-trap detection error, i.e. queries that it can’t rewrite to use two or more inline views. I suspect that these sorts of scenarios, that usually involve facts of differing granularity, where circular joins exist or where the keys to the tables don’t correspond would get picked up by the general data modeling rules in OBIEE, but if anyone’s got any experiences with this then by all means leave a comment.
If you’re wondering what prompted this posting, I’m putting the finishing touches to my Oracle Open World presentation on OBIEE data modeling, and looking to cover off a few data modeling “oddities”. I’ll try and cover support for ragged/unbalanced/value-based hierarchies in a posting tomorrow, and finish up with how the logical business model supports the Kimball concept of Bridge Tables, but no-one seems to use them.