Several months ago I wrote a blog post on pulling together an Oracle BI Suite Enterprise Edition business model from a normalized data source. At the time I was working through in my own mind how best to join different data sources together, how to make use of logical and physical joins, how logical table sources worked and so on and I got some useful feedback from Adrian Ward and Martin Hammer on how best to make use of the data modeling features in Oracle BI EE. I’m now sitting on the flight going over from Auckland to Melbourne and practicing the various demos I’ll be doing tomorrow, one of which is on this subject, so as I’ve got a couple of hours to spare I thought it worth jotting down my current thoughts on the subject so it’s fresh in my mind.
In the example I use, I have a normalized data source that’s based on the SOADEMO sample schema that comes with theOrder Bookings SOA Suite demo. The source data model looks like this:
It’s a fairly simple normalized model, where orders break down into items, which reference products, and with orders being referenced by the time of the order and the customer who placed the order. Many customers can have many addresses, with the intersection between individual customers and addresses being held in an intersection table.
Within BIEE, you always try and represent the data in the business model in terms of a star schema. Using the Kimball methodology as the start point, the star schema contains one or more fact tables, which reference and re-use one or more dimension tables. Ideally, the fact tables contain just measures and foreign key links to the dimensions, which themselves contain information on the things you “slice and dice” measures by, such as the customer income band, product category and so on.
So how do you go about converting this source data model into a star schema? Well, the first thing I would do is identify the lowest-level transactional information in the source database, which in our case are orders which are made up of one or more order items. In the star schema that I’m going to built, this is the grain that I’m declaring, so I want to base my fact table on a combination of the items and orders tables.
To do this, I create a new business model that I’ll call Customer Orders, and drag and drop the items table from the source, physical layer over to this new business model to create my first logical table. Looking at the columns in the logical table, I can see IDs for the order and the item, together with some measures and some information on the product that’s been ordered.
Now I know from looking at the rest of the physical data model that this product information is essentially duplicated from the products table, and according to the Kimball methodology, it should be taken out from the fact table and placed in the product dimension, if it’s not there already. So I remove the PARTNUM and PRODUCTNAME columns from the logical table and leave just the measures and the links through to the other tables.
One more step that’s useful is to double-click on each of the measures, and use the logical column dialog to set the default aggregation method for each measure, otherwise Answers will display fact table data as detail, rather than aggregated.
Now I’m in the situation where I’ve got the details of the order items in the logical table, but I’m missing the details on the customer and the date of order, which are in the orders source table. To bring this across, I first make sure that the physical data model in BI Administrator has physical foreign keys registered between the source tables, and then double-click on the items logical table to edit the logical table source.
To do this, I click on the Sources tab after double-clicking the items logical table, and then double-click on the items logical table source in the dialog box to start editing it.
When I first created the items logical table, by dragging the items physical table over to the business model, the logical table had a single logical table source which maps to the items physical table. What I’m going to do now though is press the Add button below the items logical table source to add the orders physical table to it, which I can do as the two tables are linked together in the physical layer by a physical foreign key.
When I add an additional physical table to the logical table source in this way, I make sure that BIEE still performs a single SQL statement to get data from the two tables, and I can now go back to the logical model and start dragging columns from the orders physical table into the items logical table. If I just simply dragged items from the orders physical table across to the items logical table without performing this step, BIEE would set up a second logical table source for the items logical table, and perform two SQL queries, one for items, one for orders, to get the data back. If I just dragged the orders table across to the logical model and created a second logical table, I’d end up with some of my dimensions joining to orders, some to items, and I wouldn’t have a valid star schema.
Once I add the second table to the logical table source, and then drag and drop the orders columns into the logical table, my data model looks like this:
If you’re wondering what the blue splotches are next to the tables and columns, it’s because I’ve not checked the table into the repository yet – I’ll do that in a moment.
So now, I’ve got the fact table for my star schema, with a grain of individual order items, and links through to customer, time and product dimensions. Next then, I drag and drop the times and product tables across from the source physical model, whereupon the model looks like this:
Notice how BIEE thinks that all of the tables are fact tables (they’re highlighted in yellow, which is how BI Administrator indicates fact tables) – this is because BI Administrator doesn’t bring across the foreign key relationships if you import tables in separate stages, it only preserves them if you bring across both sides of the foreign key relationship at the same time.
To let BIEE know about the table relationships, and to tell it that products and times are actually dimension tables, I highlight the three logical tables and select Business Model Diagam > Whole Diagram from the context menu.
I then use the menu bar to create complex joins between the dimension tables and the fact table.
Complex joins, as opposed to foreign key joins, in the logical layer merely tell BIEE that there is a relationship between the two tables, and it leaves BIEE to come up with the most appropriate way to join the two tables, based on the metadata in the repository and the joins in the physical layer. If you instead create these joins as foreign key joins, you proscribe exactly the way that the two tables will join, for example from customer ID to orders, which may not be appropriate if your logical dimension table has multiple logical table sources for different levels of aggregation.
Looking back now at my logical model, just the items table is marked as a fact table, with the other two tables now being recognized as being dimension tables, as they have a one-to-many complex join relationship between them and the items table.
Now it’s time to add the customer table. This gets added as another logical table to the logical model, and I use the same technique as before to join it to the fact table with a logical complex join, so that it’s recognized as a dimension table.
This time though, I’ve got two more physical tables, customer addresses and addresses, to add to the logical model. As they join together in the physical model, I can just edit the customer logical table source and add the two physical tables to the logical table source, like this:
Then, as with the columns in the orders table, I can add the required columns from the addresses table to the customer table, to add address information into the customer dimension table.
And that’s it in terms of creating the star schema. All that’s left now is to create the dimensions that define the drill paths down the dimension tables, like this:
Finally, it’s just a case of dragging the logical model over to the presentation layer to create a presentation model, and I can then start querying the star schema using Oracle Answers.
And that’s all there is to it. Another hour to go on the flight, time for an episode of Extras on the in-flight entertainment.