ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION 100: Obiee Architecture.

Showing posts with label Obiee Architecture.. Show all posts

Towards a Future Oracle BI Architecture?

One of the presentations I’m giving over the next couple of months is one for the UKOUG BI & Performance Managementevent on future Oracle BI architectures. The driver for this for me is around all the different options that are now available for building Oracle BI systems, now that we’ve got products from Siebel, products from Hyperion and of course the traditional Oracle BI tools such as Discoverer, Portal, Daily Business Intelligence and Oracle Reports.

In particular, having all these new tools opens up a number of questions around how you put together a BI system going into the future.

Do you still build a data warehouse upfront, dealing with all the data and integration issues upfront, or do you use the data integration features of BI tools such as Oracle BI Suite Enterprise Edition to integrate the data on-the-fly?
Do you plan and design your warehouse upfront as an IT initiative, or do you let the users design the reports and BI metadata and have those drive the warehouse design?
How do you incorporate the planning and budgeting process into your warehouse design?
Given the productivity of tools like OBIEE, can we significantly shorten the time to deliver BI projects?
If we go for a mixed BI/DW architecture featuring data warehousing, ETL, BI tools and their metadata, planning tools, OLAP servers and so on, how do create a consistent level of security over our environment, and how do we capture metadata and metrics across the entire system?
Finally, give the adoption of technologies such as SOA, Web Services, data service layers and the like, how do we incorporate these sources and activities into our BI architecture so we can leverage their features, incorporate their data and still stay relevant when not all business logic and data is held in databases?
Do you still base your BI systems around a single, monolithic data warehouse, that integrates data into a single database, deals with all the data issues once and once only, applies Oracle performance techniques such as materialized views, bitmap indexes and partitioning, with the BI element of reports and graphs delivered towards the end of the project? Or do you use the prototyping and data integration features of tools such as Oracle BI Enterprise Edition to create your data warehouse on?

So you see, it’s quite an interesting, but tricky topic, but one that I think is very pertinent to Oracle BI users and customers given some of the feedback I’ve had at user group events – not quite knowing what to do next following all of Oracle’s acquisitions is the number one issue I hear from feedback sessions at events, and it’s this uncertainty I’m looking to address.

Anyway, to cut a long story short, an idea that I’ve been batting around for a while, and something I’ve discussed with people such as Doug Cackett and Andrew Bond in Oracle, is an attempt to try and pull together an architecture that recognizes the benefits of a properly structured and loaded data warehouse, but that incorporates some of the new options that are open to us now that tools like OBIEE are around.

Data Warehouses give us consistent, reliable data plus the benefits of database server technology designed to handle large sets of summary and detail-level data. However they take a long time to build and as far as BI systems are concerned, the value they provide dimishes rapidly the longer you take to deliver them. Your warehouse project might get signed off based on some immediate market opportunities your company has spotted, but if you take a year to deliver the system, by then the opportunity may well have gone.

Tools like OBIEE however allow you to put a BI system together now, mapping its metadata layer directly against the underlying source data, applying aggregates and caches to try and speed up query performance. In some cases, tools like OBIEE can actually allow you to integrate data across multiple systems, though this will never be as fast to query as data that actually resides in just one database. Over time, systems built like this get harder and harder to manage, but at least you get reports and analysis in people’s hands fast and the business aren’t usually interested in arcane technical discussions about the merits of a properly architected data warehouse.

The thing is though, what it you could combine the two approaches, having the business define the reports and the report metadata (in OBIEE terms, the semantic model) and with the metadata layer initially mapping through directly to the source data,. Then, as time goes by, you can migrate more and more of the data you’re reporting on to a proper data warehouse, all the time keeping the user’s reports going through the ability of the BI tool metadata to be “re-pointed” to the data warehouse as subject areas come online? Tools like OBIEE allow business users to define a logical dimensional model that query tools then work against, as a BI architect you can start with this metadata layer pointing to the source data and over time migrate it to proper data warehouse structures.

At the moment, building these warehouse structures requires you to define a dimensional model in a tool like OWB, then write your mappings to bring across data from source systems into the warehouse. In time though, features on the OBIEE product roadmap promise to considerably simplify this process, with OBIEE offering the option to persist the logical model it works with in a relational schema or a multi-dimensional database such as Essbase or Oracle OLAP, and in the future it wouldn’t take too much work for OBIEE to also generate the ETL routines in a tool such as Oracle Data Integrator to copy data from your source systems into this persistence (or in other words, data warehouse) layer.

So you see the benefit here is that we put the drivers for our BI system into the hands of the business users, allowing them to create their business model and the reports that make use of it, whilst in the background we can gradually consolidate the data they report on into a proper data warehouse, in future using features to come in OBIEE to automatically generate the warehouse model or cube, and the mappings that move the data from the source systems into this data store.

The way in which we move data from sources into the warehouse, or persistence layer, is likely to evolve over time as well. At present, 99% of the data movements I see on projects are data mappings that move data from one database table to another. In future though, we’re likely to see data coming in via web services, via enterprise service bus and messages, via business events and so on, and so the tools we use to move that data will need to evolve as well. Tools like Oracle Data Integrator are already able to handle data via these sources, whilst the next release of OWB is also slated to handle web service and message sources. Taking a step on, it’s worth thinking about architecting the data integration element of your BI system as a formal SOA “Data Services Layer” where data is abstracted away from your BI tools, integrated using a set of loosely-coupled services, and handled in a framework such as Oracle Fusion Middleware. Is this likely to be a dominant pattern in the near future? I can’t say for sure, but as organizations move to SOA architectures we likely to get our data more and more via non-traditional data warehouse sources.

Finally, what about the Hyperion tools, how are they going to fit into this architecture? Well in terms of specifics, who knows yet, but for now I’d say some likely integration points are in the BI tools metadata layer, in the dashboard frameworks that deliver planning, budgeting and operational BI reports, and I guess in time both planning & budgeting data, and the data on our operational performance, will more and more be available as services to be consumed by a wider BI architecture, so the distinction between reporting data, planning data, forecasting data, data derived from databases or from business events, and so on and so on, will eventually become blurred – they’ll all just be services in a service-orientated architecture.

Anyway, that’s some thoughts for me around how we can preserve the benefits of a data warehouse architecture but take advantage of the new capabilities of OBIEE, and also respond to the take-up of service-orientated architecure. I’d be keen to get other people’s feedback (especially if you’re going to the UKOUG event) to see whether this makes sense, whether it addresses the questions you’ve got around where Oracle’s BI architecture is going, whether you think it’s realistic to start off with an OBIEE model going against source data and eventually migrate it to running against a data warehouse, whether you think events and services will be big factors in BI architectures, and so on. Add some comments if you like, I’ll respond to the comments and follow-up later in the week.

A Future Oracle OBIEE Architecture

Following on from my blog article earlier this week on future Oracle BI architectures, I’ve read through the comments and had a think about how things might work, and put together some thoughts on how organizations might use Oracle’s Enterprise Edition BI tools now and going into the future.

To recap, the idea here is to take the traditional BI architecture of a data warehouse, metadata layer and a query tool and update it for the heterogeneous, service-orientated world that is Oracle now. In this architecture, we’re going to take advantage of the ability for Oracle Business Intelligence Enterprise Edition Plus (OBIEE, previously known as Siebel Analytics) to map to multiple data sources (including applications, relational databases and OLAP servers) and create a single logical business model, and the fact that Oracle now have an ETL tool in Oracle Data Integrator that can extract from and load to a range of different databases, web services and other business services. We’re also going to try and think about how the tools from Hyperion will work in this architecture, both now in their unintegrated state and going into the future, when Oracle will no doubt try and integrate them with the OBIEE platform.

To start at a high level, the proposed future Oracle BI architecture will have three distinct layers;

a Data Services Layer that contains the data and associated ETL processes
a Business Logic Layer that maps to the Oracle BI Server, and
a Presentation Layer that maps to the Oracle BI Presentation Server and the Hyperion planning and performance management tools.

The idea of a data services layer comes from the SOA world, and is a way of creating an abstraction layer for SOA business processes that need to access business data. In our architecture, it contains the application data sources, the data warehouse when we create it, OLAP data sources, web service and business event data, and the ETL processes that moves and transforms data. The business logic layer is where the BI Server holds physical, logical and presentation models of your data, together with hierarchies, calculations, metrics, KPIs and access rules, with the presentation layer mapping to Oracle BI Presentation Server and the legacy Hyperion CPM tools that take data from the business logic layer in the former case, and the data services layer (at least initially) in the latter case.

To illustrate how the architecture might be developed, we can consider a typical organization who wishes to implement an Oracle business intelligence solution. For this organization, their deployment can be broken down into three separate phases:

Initial deployment of pilot reports as “quick wins” and to determine detailed requirements
Gradual consolidation of reporting data into a data warehouse
Implementation of advanced functionality in the OBIEE 11g+ timeline

This assumes that the organization has no particular BI strategy in place at the moment, currently provides reports through disparate tools directly against source applications, and wishes to provide a reporting solution as part of a wider Oracle Fusion Middleware strategy.

1) Initial Pilot Deployment

Organizations looking to deploy BI solutions often have two conflicting drivers for how they approach their project; they want to get reports to users quickly, so that they can take advantage of the market opportunity that’s led them to want a BI solution, however the IT department are often concerned with the long-term viability of the reporting solution and wish to build it based on a data warehouse.

In the past, it was often difficult to cover both of these requirements as once you starting building reports and a metadata layer against your operational applications, it was difficult to re-point these towards a data warehouse once you started building one. With OBIEE though, organizations can create a metadata model that separates the logical objects that users report against from the physical database tables that provide data for them.

Therefore, in this first phase, we will create an initial metadata layer in the business logic layer using Oracle BI Server and Oracle BI Administration, and initially point it towards the source applications that contain the data users want to report on. Where possible, a single logical model will be created, however at this early stage it’s likely that the business logic later will contain individual logical, physical and presentation models for each source system being accessed.

Where appropriate, Oracle Data Integrator will be used to created pre-joined copies of certain source application tables, together with some limited aggregations, which will be placed in a database held within the data services layer. These tables will go on to form the kernel of the data warehouse that will be built in the next phase. Using this initial data services, business logic and presentation layers, the business can provide some initial reports, gain some quick wins and also provide a pilot platform that helps the process of gathering more detailed requirements.

2) Gradual Consolidation of Data into a Data Warehouse

The main phase in development follows the initial pilot and quick wins and looks to put in place a scalable, consolidated reporting environment. Where appropriate, data is extracted in real-time or batch from the source applications and consolidated into a classic data warehouse, with staging, atomic and dimensional (performance) layers, ideally using a database such as Oracle to provide indexing, summary management, in-database OLAP and storage of large amounts of detailed and summary-level data.

For more complex analytical needs, or if the business wishes to use planning, budgeting or financial consolidation tools, data can be loaded from the warehouse into an OLAP server such as Essbase, with all transformations and data loading taking place through Oracle Data Integrator and its repository. ODI also provides a means to integrate data via messages, enterprise service bus, web services and business events, with the overall data service layer refresh process being orchestrate through BPEL and ODI agents.

The primary route for data out of the data services layer is through this data warehouse. For data sources that are transitory in nature, or that have not been incorporated into the warehouse yet, direct access to these sources will still be supported with the business logic layer continuing to map to these sources as needed, as well as the new data warehouse.

The business logic layer, initially defined in the pilot phase, contains models describing the data sources within the data access layer, the business model over these data sources, and the customized views of the business model used by different parts of the organization.

Initially, most of the query tools within the presentation layer will access their data through this business model. Legacy Hyperion tools will for the time being bypass this layer and talk directly to the Essbase server in the data services layer.

In the initial phases of the BI implementation it is likely that multiple logical business models will exist in this layer whilst separate data sources are being combined into combined fact tables and conformed dimensions, the end goal though is to produce a single unified logical model that spans the entire organization and allows analysis across linked dimensions and facts.

3) Implementation of advanced functionality in the OBIEE 11g+ timeline

Everything in the architecture so far is supported by the current generation of OBIEE, ODI, Oracle SOA Suite and Hyperion tools. Going into the future though, there are obvious ways in which this architecture can be improved as integration continues between Oracle’s BI tools.

At present, OBIEE can generate aggregates to speed up user queries through the use of the Aggregate Persistence Wizard. It is likely that in future, OBIEE will also be able to generate a physical database schema to match a logical business model in the OBIEE repository, either in a relational database or as a multi-dimensional cube in Essbase or Oracle OLAP.

It’s also reasonably likely that OBIEE will also be capable of automatically generating ETL mappings for tools such as ODI, “function-shipping” the process of transforming data to an ETL tool rather than have the BI Server do it at runtime. To what extent these function-shipped ETL mappings will be able to handle complex mappings is obviously not known.

It is also likely that much of the metadata used by the former Hyperion tools will make it’s way into the OBIEE physical, business and presentation models, with the business model undergoing development so that it can hold items such as metrics, KPIs and goals as first-class data items.

Eventually, you can imagine the existing OLAP query tools used by Essbase (Web Intelligence etc) being replaced by the forthcoming release of Oracle Answers, which can present data as a multi-dimensional hierarchical model, negating the need for standalone OLAP query tools.

So what’s left? Well we haven’t tackled some of the “edge case” OBIEE tools such as Real-Time Decisions (this is likely to sit in the business logic layer and provide services for the presentation layer), and tools such as Brio, Discoverer and the like which are not part of Oracle’s strategic direction.

It also rather glosses over the slightly awkward manner in which ODI provides access to service and event-based data (more on that in a later posting), and of course bringing data together into a warehouse whilst at the same time providing consistent data for the business logic layer is of course a complex task, but this is I guess where the data mart automation tools coming in OBIEE 11g+ are likely to help out – the idea here is that persisting your report data in a dimensional model should be a fairly automated process, not one that has to be re-invented for every deployment.

For now though, hopefully it’s a fair stab I hope at pulling together a coherent, next-gen Oracle BI architecture, as usual comments are welcome and I’d appreciate any feedback before I start to put the presentation together.

Thoughts on OBI EE for Discoverer Users

I mentioned the other days that one of the papers I’m delivering for ODTUG Kaleidoscope 2007 is on Oracle Business Intelligence Enterprise Edition (OBI EE) for Discoverer users. The aim of this paper is to introduce OBI EE to those Oracle customers who currently use Discoverer, perhaps have requirements that Discoverer can’t currently handle, has heard about OBI EE and wants to find out whether it’s a product they might want to upgrade to. By putting OBI EE features in the context of Discoverer, it should make some of the more esoteric features of the product a bit easier to understand (the common enterprise information model, Oracle Delivers, Oracle Answers, what the Oracle BI Server actually does and so on) and it should make a change from the usual “Introduction to Oracle BI EE” presentations we’ve all sat through (or delivered) over the past twelve months or so.

So, I guess that if we’re looking at a potential upgrade for Discoverer users, it would be useful to take a moment to think about what’s good, and what’s not so good, about Oracle Discoverer. In the paper I’ll do a quick one slide/one paragraph recap on what Discoverer is, and then go through what I feel are the major plus-points of the tool:

Easy to use, lots of wizards, familiar look-and-feel, high awareness and exposure within the Oracle user community – many people have exposure to Discoverer through its apps integration, bundling with Oracle Application Server and so on – the “comfort” factor if you like.
Leveraging of Oracle’s built in calculation, analytic and PL/SQL functions – Discoverer uses the Oracle database as the calculation engine, you get access to all the built-in SQL and PL/SQL features including all the analytic (lag, lead, window, top N etc) functions.
Integration with Oracle database security and E-Business Suite responsibilities, and pre-built E-Business Suite reports and BI metadata layer.
Integration with Oracle Warehouse Builder (although this requires the Enterprise ETL Option for Warehouse Builder, at $10k a CPU on the ETL database), and integration with Oracle Portal
Oracle OLAP access through Discoverer for OLAP
Lots of functionality around totals, percentages and other report add-ons

If you were honest, the failings of Discoverer could be summed up as the following:

Oracle database-centric; although Discoverer can connect to non-Oracle databases, this is a fairly complicated DBA task and still requires everything to be routed through the Oracle Database, and the End User Layer’s Oracle database dependency still means you need an Oracle database somewhere, even if all your data is in MS SQL Server, for example
Although Discoverer integrates with Oracle Portal, in my opinion is not an optimal solution as it’s tricky to get all the report refreshes working properly, the reports in Portal don’t show a real-time view of the data underneath, you can’t drill and analyze in-place, and Portal itself is a bit overkill for just a BI portal
It’s very hard, if not impossible, to get Discoverer reports to run lightening-fast; typically Discoverer reports take 10, 20 seconds or more to return data, Discoverer itself adds a significant time overhead to queries, it’s just not a fast, snappy environment to report in.
The report authoring part of Discoverer requires a Java applet to be installed and then run in the client PCs Web browser, which can cause security and installation issues for users not running with admin rights, and requires a higher-spec (memory, CPU) PC to run on.
Discoverer for OLAP, whilst very similar to regular relational Discoverer in terms of functionality, look and feel, is however still different, has more limited capabilities (no parameters, can’t total by attribute and so on) and has a separate report catalog and security setup to Discoverer relational.
There’s (currently) no capability to create alerts, distribute reports, add in-context messages to reports giving advice on how to interpret the report.
There’s also (currently) no way of calling Discoverer reports via an API, or adding workflow to Discoverer reports so that a user clicking on a report area or a link displayed along it can trigger, say, a BPEL or Oracle Workflow process to act on insights provided by the report.
There is also (currently) no way of displaying Discoverer-generated data in, say, a letter, or printed labels, or in a report that contains more than one dataset.

I say “(currently)” in some of these issues because a few of them are being addressed by planned integration of Discoverer with OBI EE, and I’ll address these future alternatives to a straight upgrade to OBI EE later in the paper. For the time being though, Discoverer’s advantages could be described as its familiarity, leveraging of Oracle database features, support for totaling, percentages and analytic functions and integration with Warehouse Builder and Portal, whilst the drawbacks are this very Oracle database integration, lack of alerting and report distribution features, lack of APIs and interconnectivity with the application development world, limited output options, and performance, which shouldn’t be overlooked as it’s the number one compliant I hear about Discoverer when I visit customer sites.

So, with these points in mind, I’ll do a short introduction to OBI EE and then start to go through its features, placing them in the context of Discoverer and Discoverer’s good and bad points. As I’m conscious that this is only a one hour session, and working to my “golden rule” that people can only take in a maximum of six things during a presentation before it all starts to wash over them, I’ll concentrate on the following key features.

The architecture of OBI EE
What Oracle Answers is, and how it compares to Discoverer. At this point I would build a simple report, and afterwards highlight in summary some of the things Answers does better/differently than Discoverer.
How Oracle Interactive Dashboards works, how it differs to Oracle Portal and how it integrates with Answers. Again, I’d quickly show off Dashboards by bringing in the previously created report, and point out in summary what else it does different/better than Oracle Portal.
How the alerting and distribution element works (Oracle BI Delivers)
How the metadata layer (Common Enterprise Information Model) works, how it differs from the Discoverer EUL
How it integrates in with Web Services, BPEL and SOA
How it integrates with Oracle E-Business Suite
What Oracle BI Publisher brings to things

(I know that’s 8, rather than 6, things, but hopefully I’ll still have the audience with me at that point).

Now that I’ve gone through the key points of the new products, it’d be worth talking about the migration process and also, what elements of functionality Discoverer has that the new products don’t have, or don’t do as well.

For the migration process, this would have two strands. Firstly, the Oracle Discoverer development team are working on a migration utility that will, to one extent or another, automate the process of creating OBI EE metadata (and possibly reports) from the equivalent Discoverer metadata and workbooks. Now at the time of delivering this paper it’s unlikely the utility will be publicly available, so whilst I’ll make reference to it, I’ll mainly look at migrating Discoverer elements manually. This will consist of a couple of slides on the process of bringing the Videostore dataset in to OBI EE’s metadata layer, creating regular and time dimensions, recreating complex folders in the OBI EE presentation layer and custom folders as SELECT tables the physical layer, applying security and re-implementing workbooks, which of course could be a presentation in itself, so I’ll only have time to cover this superficially in the presentation, although I might have time to cover it in more depth in the accompanying paper.

In terms of what functionality is currently missing in OBI EE, I thought this was an interesting (and important) area to cover as “what’s missing” is not normally something you’d see covered in Oracle sales presentations or product literature, and it’s only something you tend to find out once you try to implement a migration. As I personally haven’t migrated a Discoverer system to OBI EE yet, I’ve based my observations on migrating the Videostore data and workbooks to OBI EE, and so far I’ve come up with the following areas OBI EE still falls short:

Discoverer is still, in my opinion, more intuitive for first-time or less experienced users to create or open reports. Discoverer provides you with a Workbook wizard when you first start up giving you simple choices around opening a report, creating a report and so on, whilst Answers provides you with a fairly busy, hard-to-fathom web interface with lots of very small buttons, no real guidance on what to do next. Now the Answers development team would no doubt say that, in reality, most users won’t see Answers as you do the vast majority of report querying directly within Dashboard (in Discoverer/Portal, anything more than just viewing a snapshotted Discoverer worksheet portlet requires launching out into Discoverer Viewer) but even so, Answers could be a bit more newbie-friendly.
The range of analytic functions and other calculations in Answers is limited compared to the Oracle PL/SQL and SQL functions made available from Discoverer. For example, whilst coming with a number of analytic functions such as Rank, TopN and so on, Answers’ time series calculations are limited to prior period (“ago”) and period-to-date (“todate”), whilst Discoverer has access to all the Oracle analytic functions in the relational version, and all the OLAP DML functions (forecasts, allocations, statistical etc) in the OLAP version.
OLAP in general in Answers is very limited (there’s no ability to select dimension members via their position in a hierarchy, no concept of dimension attributes, no ability to drill up or to related) although Answers does handle the “OLAP as a star-schema performance booster” use case well, as it just presents OLAP data as a relational dataset and handles the issue of fully-solved cubes very well (and it talks MDX and XML/A natively, opening up MS AS, SAP BW and in the future, Essbase, very well).
There’s currently a lot of duplication at the BI Server level with equivalent Oracle database features – BI Server has it’s own summary management, query rewrite, security layer and so on – which whilst useful in a heterogenous environment (for consistency) leaves you wondering when to use OBI EE features, when to use Oracle RDBMS features.
Cost, of course. OBI EE on a per-CPU basis is more than ten times the cost of Discoverer, although on a named user basis the cost difference is less marked and the upcoming OBI Standard Edition one is more comparable cost-wise, albeit with limits on the number of server CPUs and end-users

So, in summary, that’s what I’m going to try and cover, Again, considering I’ve only got an hour, I might end up covering less points but in more detail, or just to reduce the pace of the talk, but if you can think of anything else that’s relevant, or questions you’d like answered (I’ll post the presentation and paper on the site when they’re written), just add a comment and let me know.

Inside the Oracle BI Server Part 2 : How Is A Query Processed?

In the first article on this series about the Oracle BI Server, I looked at the architecture and functions within this core part of the OBIEE product set. in this article, I want to look closer at what happens when a query (or “request”) comes into the BI Server, and how it translates it into the SQL, MDX, file and XML requests that then get passed to the underlying data sources.

March 1st, 2010 by

In the previous article, I laid out a conceptual diagram of the BI Server and talked about how the Navigator turned incoming queries into one or more physical database queries. As a recap, here’s the architecture diagram again:

Now as we all know, the BI Server uses a three-layer metadata model that exposes one or more databases (or “subject areas”) for ODBC-compliant query tools to run queries against. Here’s a typical metadata model that takes a number of physical data sources, joins them together into a smaller number of business model and mapping models, and then presents them out to the query tool (usually, Oracle BI Answers) as a set of databases made up of relational tables, columns and joins.

Usually you access this metadata model using Oracle BI Answers, which presents you with an initial choice of subject areas (databases in ODBC terminology) and then displays the contents of one of them as a list of tables and columns (in 11g, you’ll be able to to include tables from multiple subject areas in queries as long as there are tables in common between them).

Other ODBC-compliant query tools, such as Microsoft Excel, Cognos or Business Objects, can access these subject areas and run queries against them just as if it was a regular database. Here’s Microsoft Excel 2007 building a query against the same subject area:

What Happens When the BI Server Handles a Query?
So just what happens then, when a query (or “request’) comes in from one of these sources, and needs to be processed in order to return results to the user? As you’re probably aware, the BI Server doesn’t itself hold data (except cached results from other queries, when this feature is enabled); instead, it translates the incoming “logical” query into one or more outgoing “physical” queries against the relevant data sources. As such, a logical model presented to users might be mapped to data in an Oracle data warehouse, an E-Business Suite application, some data in a Teradata data warehouse, some measures in an Essbase cube and even some data in an Excel spreadsheet. The BI server resolves this complexity by creating a simplified, star schema business model over these data sources so that the user can query it as if it’s a single source of data.

If you’re used to the Oracle database, you’ll probably know that it has various components that are used to resolve queries – the library cache, query rewrite, table and system statistics, etc – and both rule-based and cost-based optimizers that are used to generate a query plan. For most modern Oracle systems, a statistics-based cost-based optimizer (most famously documented by Jonathan Lewis in this book) is used to generate a number of potential execution plans (which can be displayed in a 10035 trace), with the lowest cost being chosen to run the query. Now whilst the equivalent process isn’t really documented for the BI Server, what it appears to do is largely follow a rule-based approach with a small amount of statistics being used (or not used, as I’ll mention in a moment). In essence, the following sources of metadata information are consulted when creating the query plan for the BI Server;

The presentation (subject area) layer to business model layer mapping rules;
The logical table sources for each of the business columns used in the request;
The dimension level mappings for each of the logical table sources;
The “Number of Elements at this Level” count for each dimension level (potentially the statistics bit, though anecdotally I’ve heard that these figures aren’t actually used by the BI Server);
Whether caching is enabled, and if so, whether the query can be found in the cache;
What physical features are enabled for the particular source database for each column (and whether they are relational, multi-dimensional, file, XML or whatever)
Specific rules for generating time-series queries, binning etc, and
Security settings and filters

As far as I can tell, there are no indexes, no statistics (apart from the dimension level statistics mentioned above) and no hints; there is however query rewrite and aggregates, as the BI Server allows aggregate tables to be defined which are then mapped in to specific levels in a dimension hierarchy. Cleverly, the back-end data source doesn’t even have to be an SQL database, and can in fact be a multi-dimensional database such as Essbase, Oracle OLAP or Microsoft Analysis Services, with the multi-dimensional dataset that they return converted into a row-based dataset that can be joined to other data coming in from a more traditional relational database.

“A Day in the Life of a Query”
A good way of looking at what Oracle has termed “A day in the life of a query”, is to take a look at some slides from a presentation that Oracle used regularly around the time of the introduction of Oracle BI EE. I’ll go through it slide by slide and add some interpretation from myself.

1. A query comes in from Answers or any other ODBC query tool, asking for one or more columns from a subject area. Overall, the function within the BI Server that deals with this is called Intelligent Request Generation, marked in yellow in the diagram below.

2. The query is then passed to the Logical Request Generation engine, marked in yellow in the diagram below. The request itself requires the Brand, Closed Revenue (ultimately held in the GL system), Service Requests (held in the CRM system) and Share of Revenue (a calculated, or derived, measure). As such it’s going to require multiple physical SQL queries and multi-pass calculations, all of which will be worked out by another part of the BI Server architecture, the Navigator.

3. Once the logical request has been generated but before its passed off to the Navigator, a check is made (if this feature is enabled) as to whether the logical request can be found in the cache. Cache Services will either do a fast, or more comprehensive match of the incoming request against those stored in the query cache, and if found, return the results from there rather than have the BI Server run physical SQL against the business model’s data sources.

For a more detailed look at what Cache Services does, the old Siebel Analytics Administration Tool documentation has a good flowchart that explains what goes on:

The key bit is the Cache Hit step. In general, a cache hit will occur if the following conditions are met:

Caching is enabled (CACHE=Y in the NQSConfig.INI file);
The WHERE clause in the logical SQL is semantically the same, or a logical subset of a cached statement;
All of the columns in the SELECT list have to exist in the cached query, or they must be able to be calculated from them;
It has equivalent join conditions, so that the resultant joined table of any incoming query has to be the same as (or a subset of) the cached results
If DISTINCT is used, the cached copy has to use this attribute as well
Aggregation levels have to be compatible, being either the same or more aggregated than the cached query
No further aggregation (for example, RANK) can be used in the incoming query
Any ORDER BY clause has to use columns that are also in the cached SELECT list

In addition, there are two NQSConfig.INI parameters that I think were added in the last few releases (as I can’t find them mentioned in the Siebel Analytics documentation) are USE_ADVANCED_HIT_DETECTION and MAX_SUBEXPR_SEARCH_DEPTH. The latter determines how many levels into an expression (for example, SUM(MAX(SIN(COS(TAN(ABS(TRUNC(PROFIT)))))))) that the cache hit detector will go in trying to get a match, whilst the former turns on some additional cache hit searches that you might want to enable if caching is important but not otherwise happening. Unfortunately the docs don’t really expand on what these additional searches are or the performance impact that they can introduce, so if anyone has any more information on this, I’d be glad to hear.

4. If the cache can’t provide the answer to the request, the request then gets passed to the Navigator. The Navigator handles the logical request “decision tree” and determines how complex the request is, what data sources (logical table sources) need to be used, whether there are any aggregates that can be used, and overall what is the best way to satisfy the request, based on how you’ve set up the presentation, business model and mapping, and physical layers in your RPD.

5. Within the Navigator, the Multi-Pass / Sub-Request Logic function analyzes the incoming request and works out the complexity of the query. It works out whether it requires multiple passes (for example, calculates the average of two aggregated measures), or whether the request is based on the results of another request (in other words, uses a sub-request). The BI Server then uses this information to work out the optimal way to gather the required data and do the calculations; in the example used in the slides, the revenue share calculation is based on the other two measures and is therefore considered “multi-pass”.

6. A measure used within the business model and mapping layer may be “fragmented”, which means that it is logically partitioned so that historic information, for examples, comes from a data warehouse whilst current information comes from an OLTP application. The Fragment Optimization Engine within the Navigator sits between the incoming request and the Execution Engine and where appropriate, transforms the base-level SQL into “fragmented” SQL against each of the data sources mapped into the fragmented measure. For more background information on fragmentation, check out this old blog post on the subject.

7. The final function within the Navigator is the Aggregate Navigator, which uses the logical table source mappings together (in theory) with the dimension level statistics to determine the most efficient table to fetch the data from (i.e. the table with the least number of records to successfully fulfil a request).

8. The Optimized Query Rewrites function within the BI Server then takes the query plan generated by the Navigator and rewrites it to use the features of the underlying database engines, adding RANK(OVER()) calculations if Oracle is being used, for example (referred to as “function shipping”) or just getting the raw data and having the BI Server do the calculations afterwards, if working with a database that doesn’t support analytic SQL functions. This part of the BI Server is also responsible for generating XML queries, or MDX queries for OLAP sources,
which are then sent to the underlying physical databases, in parallel, so that they can retrieve their relevant data sets.

9. Once the data is retrieved, the results combined together and any further calculations applied, the results are returned to the calling application via the ODBC interface, and also copied to the cache along with the logical SQL query if caching is enabled.

The BI Server’s knowledge of what each source database can support, in terms of SQL functions, is determined by the contents of the DBFeatures.INI configuration file which can in turn be over-ridden by the “Features” tab in the Database settings in the physical database model.

I think I’ve also noticed that, from release to release of OBIEE, the way that time-series queries, for example, get resolved into physical SQL queries changes over time, as Oracle get better at generating efficient SQL queries to resolve complex calculations. It’s also the case that currently, for Essbase data sources, very few of the functions used by the BI Server get function-shipped to their equivalent MDX functions, though this is meant to be improving in the forthcoming 11g release (and in the meantime, you can use EVALUATE and EVALUATE_AGGR to call MDX functions directly).

Level 5 Logging, and Logical Execution Plans
You can see what goes on when a complex, multi-pass request that requires multiple data sources is sent through from Answers and gets logged in the NQQuery.log file with level 5 logging enabled. The query requests “quantity” information that is held in an Oracle database, “quotas” that comes from an Excel spreadsheet, and “variance” which is derived from quantity minus quotas. Both columns need to be aggregated before the variance calculation can take place, and you can see from the logs the Navigator being used to resolve the query.

Starting off, this is the logical request coming through.

-------------------- Logical Request (before navigation):

RqList
    Times.Month Name as c1 GB,
    Quantity:[DAggr(Items.Quantity by [ Times.Month Name, Times.Month ID] )] as c2 GB,
    Quota:[DAggr(Items.Quota by [ Times.Month Name, Times.Month ID] )] as c3 GB,
    Quantity:[DAggr(Items.Quantity by [ Times.Month Name, Times.Month ID] )] - Quota:[DAggr(Items.Quota by [ Times.Month Name, Times.Month ID] )] as c4 GB,
    Times.Month ID as c5 GB
OrderBy: c5 asc

Then the navigator breaks the query down, works out what sources, multi-pass calculations and aggregates can be used, and generates the logical query plan.

-------------------- Execution plan:

RqList <<993147>> [for database 0:0,0]
    D1.c1 as c1 [for database 0:0,0],
    D1.c2 as c2 [for database 3023:491167,44],
    D1.c3 as c3 [for database 0:0,0],
    D1.c4 as c4 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<993160>> [for database 0:0,0]
    (
        RqList <<993129>> [for database 0:0,0]
            D1.c1 as c1 [for database 0:0,0],
            D1.c2 as c2 [for database 3023:491167,44],
            D1.c3 as c3 [for database 0:0,0],
            D1.c4 as c4 [for database 0:0,0],
            D1.c5 as c5 [for database 0:0,0]
        Child Nodes (RqJoinSpec): <<993144>> [for database 0:0,0]
            (
                RqBreakFilter <<993128>>[1,5] [for database 0:0,0]
                    RqList <<992997>> [for database 0:0,0]
                        case  when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end  as c1 GB [for database 0:0,0],
                        D903.c3 as c2 GB [for database 3023:491167,44],
                        D903.c4 as c3 GB [for database 0:0,0],
                        D903.c3 - D903.c4 as c4 GB [for database 0:0,0],
                        case  when D903.c5 is not null then D903.c5 when D903.c6 is not null then D903.c6 end  as c5 GB [for database 0:0,0]
                    Child Nodes (RqJoinSpec): <<993162>> [for database 0:0,0]
                        (
                            RqList <<993219>> [for database 0:0,0]
                                D902.c1 as c1 [for database 0:0,0],
                                D901.c1 as c2 [for database 3023:491167,44],
                                D901.c2 as c3 GB [for database 3023:491167,44],
                                D902.c2 as c4 GB [for database 0:0,0],
                                D902.c3 as c5 [for database 0:0,0],
                                D901.c3 as c6 [for database 3023:491167,44]
                            Child Nodes (RqJoinSpec): <<993222>> [for database 0:0,0]

                                    (
                                        RqList <<993168>> [for database 3023:491167:ORCL,44]
                                            D1.c2 as c1 [for database 3023:491167,44],
                                            D1.c1 as c2 GB [for database 3023:491167,44],
                                            D1.c3 as c3 [for database 3023:491167,44]
                                        Child Nodes (RqJoinSpec): <<993171>> [for database 3023:491167:ORCL,44]
                                            (
                                                RqBreakFilter <<993051>>[2] [for database 3023:491167:ORCL,44]
                                                    RqList <<993263>> [for database 3023:491167:ORCL,44]
                                                        sum(ITEMS.QUANTITY by [ TIMES.MONTH_MON_YYYY] ) as c1 [for database 3023:491167,44],
                                                        TIMES.MONTH_MON_YYYY as c2 [for database 3023:491167,44],
                                                        TIMES.MONTH_YYYYMM as c3 [for database 3023:491167,44]
                                                    Child Nodes (RqJoinSpec): <<993047>> [for database 3023:491167:ORCL,44]
                                                        TIMES T492004
                                                        ITEMS T491980
                                                        ORDERS T491989
                                                    DetailFilter: ITEMS.ORDID = ORDERS.ORDID and ORDERS.ORDERDATE = TIMES.DAY_ID [for database 0:0]
                                                    GroupBy: [ TIMES.MONTH_MON_YYYY, TIMES.MONTH_YYYYMM]  [for database 3023:491167,44]
                                            ) as D1
                                        OrderBy: c1 asc [for database 3023:491167,44]
                                    ) as D901 FullOuterStitchJoin <<993122>> On D901.c1 =NullsEqual D902.c1; actual join vectors:  [ 0 ] =  [ 0 ]

                                    (
                                        RqList <<993192>> [for database 0:0,0]
                                            D2.c2 as c1 [for database 0:0,0],
                                            D2.c1 as c2 GB [for database 0:0,0],
                                            D2.c3 as c3 [for database 0:0,0]
                                        Child Nodes (RqJoinSpec): <<993195>> [for database 0:0,0]
                                            (
                                                RqBreakFilter <<993093>>[2] [for database 0:0,0]
                                                    RqList <<993319>> [for database 0:0,0]
                                                        D1.c1 as c1 [for database 0:0,0],
                                                        D1.c2 as c2 [for database 0:0,0],
                                                        D1.c3 as c3 [for database 0:0,0]
                                                    Child Nodes (RqJoinSpec): <<993334>> [for database 0:0,0]
                                                        (
                                                            RqList <<993278>> [for database 3023:496360:Quotas,2]
                                                                sum(QUANTITY_QUOTAS.QUOTA by [ MONTHS.MONTH_MON_YYYY] ) as c1 [for database 3023:496360,2],
                                                                MONTHS.MONTH_MON_YYYY as c2 [for database 3023:496360,2],
                                                                MONTHS.MONTH_YYYYMM as c3 [for database 3023:496360,2]
                                                            Child Nodes (RqJoinSpec): <<993089>> [for database 3023:496360:Quotas,2]
                                                                MONTHS T496365
                                                                QUANTITY_QUOTAS T496369
                                                            DetailFilter: MONTHS.MONTH_YYYYMM = QUANTITY_QUOTAS.MONTH_YYYYMM [for database 0:0]
                                                            GroupBy: [ MONTHS.MONTH_YYYYMM, MONTHS.MONTH_MON_YYYY]  [for database 3023:496360,2]
                                                        ) as D1
                                                    OrderBy: c2 [for database 0:0,0]
                                            ) as D2
                                        OrderBy: c1 asc [for database 0:0,0]
                                    ) as D902
                        ) as D903
                    OrderBy: c1, c5 [for database 0:0,0]
            ) as D1
        OrderBy: c5 asc [for database 0:0,0]
    ) as D1

Notice the “FullOuterStitchJoin” in the middle of the plan? We’ll look into this more in the next posting in this series. For now though, this logical query plan is then passed to the Optimized Query Rewrites and Execution Engine, which then generates in this case two physical SQL statements that are then passed back, and “stitch joined”, by the BI Server, before performing the post-aggregation calculation required for the variance measure.

-------------------- Sending query to database named ORCL (id: <<993168>>):

select D1.c2 as c1,
     D1.c1 as c2,
     D1.c3 as c3
from
     (select D1.c1 as c1,
               D1.c2 as c2,
               D1.c3 as c3
          from
               (select sum(T491980.QUANTITY) as c1,
                         T492004.MONTH_MON_YYYY as c2,
                         T492004.MONTH_YYYYMM as c3,
                         ROW_NUMBER() OVER (PARTITION BY T492004.MONTH_MON_YYYY ORDER BY T492004.MONTH_MON_YYYY ASC) as c4
                    from
                         CUST_ORDER_HISTORY.TIMES T492004,
                         CUST_ORDER_HISTORY.ITEMS T491980,
                         CUST_ORDER_HISTORY.ORDERS T491989
                    where  ( T491980.ORDID = T491989.ORDID and T491989.ORDERDATE = T492004.DAY_ID )
                    group by T492004.MONTH_MON_YYYY, T492004.MONTH_YYYYMM
               ) D1
          where  ( D1.c4 = 1 )
     ) D1
order by c1

+++Administrator:2b0000:2b000e:----2010/02/23 16:04:42

-------------------- Sending query to database named Quotas (id: <<993278>>):
select sum(T496369."QUOTA") as c1,
     T496365."MONTH_MON_YYYY" as c2,
     T496365."MONTH_YYYYMM" as c3
from
     "MONTHS" T496365,
     "QUANTITY_QUOTAS" T496369
where  ( T496365."MONTH_YYYYMM" = T496369."MONTH_YYYYMM" )
group by T496365."MONTH_YYYYMM", T496365."MONTH_MON_YYYY"

Memory Usage and Paging Files

If you follow the BI Server at the process level during these steps, you’ll find that memory usage is largely determined at startup time by the size and complexity of the RPD thats attached online, and then goes up by around 50MB when the first query is executed. After that, memory usage tends to go up the more concurrent sessions that are run, and also when cross-database joins are performed. You’ll also find TMP files being created in $ORACLEBIDATA/tmp directory, which are used by the BI Server to hold temporary data as it pages out from memory, again typically when cross-database joins are used but also when it needs to perform additional aggregations that can’t be put into the physical SQL query.

These files can get fairly big (up to 2GB in some cases) and can be created even when a single data source is used, typically for grouping data or as we’ll see in the next posting, when joining data across fact tables. They are usually cleared down when the BI Server and Presentation Server are restarted, but bear in mind when creating complex calculations that they can get pretty I/O intensive on the BI Server hardware.

So that’s the basics in terms of how basic queries are processed by the BI Server, and how the various BI Server components and engines process the query as it goes through the various stages. Again, if anyone knows any more, please add it as a comment, but for now that’s it and I’ll be back in a few days with part 3, on BI Server In-Memory Joins.

Towards a Future Oracle BI Architecture?

A Future Oracle OBIEE Architecture

Thoughts on OBI EE for Discoverer Users

Inside the Oracle BI Server Part 2 : How Is A Query Processed?

Rittman Mead - Delivered Intelligence

Inside the Oracle BI Server Part 2 : How Is A Query Processed?

Subscribe To