Comparing OBIEE Usage Tracking with NQSQUERY.LOG

The other day I posted about the query log file generated by the Oracle BI Server, and how you could use it to add diagnostics to your query environment. Scott Powell added a comment to the posting suggesting that I take a look at the Usage Tracking feature within OBIEE, as this logs query statistics to a database table and allows you to run reports against them. I’d made a mental note of usage tracking but not used it in anger, and as I heard at Open World that the usage tracking feature had been improved in 10.1.3.4, I thought I’d take a look and see how it now worked.
Usage Tracking isn’t something that’s installed by default with OBIEE, you have to run a script to create a tracking table called S_NQ_ACCT in your database and then register the table in the repository and the NQSCONFIG.INI file. You can log statistics to a file instead, but direct insertion into a table is the recommended approach as this allows better analysis and you can also batch inserts up to reduce the load on the server. Something that I think is new in 10.1.3.4 is a subdirectory under the $ORACLEBI/server/sample directory called “usagetracking”, this contains a number of extra table creation scripts, a repository RPD file that you can import and some web catalog files to provide instant reports over your usage tracking data, something that I think first shipped with the BI Applications and now ships with OBIEE from this release onwards.
Usage1-1
I don’t know if it was just me, but it actually took me quite a lot of time to get Usage Tracking up and running though. FIrst off, there are some extra tables to create, including a view (S_NQ_LOGIN_GROUP) that is required for the various reports but only seems to ship with an SQL Server script – in fact the various repository files and scripts look like they were originally developed for SQL Server, which is the database I often see some of the original nQuire product managers demoing OBIEE with, I guess historically this is the database many of the demos were developed for and that’s reflected itself in the Usage Tracking samples that ship with 10.1.3.4. Once you’ve got the tables developed you need to add a couple of entries to the NQSCONFIG.INI file to turn on usage tracking, tell the BI Server to write the log entries to the database and to specify which database tables to insert to.
Usage1A-2
The key entries are:
[ USAGE_TRACKING ]

ENABLE = YES;

DIRECT_INSERT = YES;

PHYSICAL_TABLE_NAME = <name of physical S_NQ_ACCT table, as located in the physical model>

CONNECTION_POOL = <name of connection pool for log writes, should be separate to other connection pools>

BUFFER_SIZE = 10 MB;

BUFFER_TIME_LIMIT_SECONDS = 5;
The repository also took a while to import properly; it makes use of old-style time series functionality and I had to recreate all the derived measures (change since last year, percentage change since last year and so on) using the new “Time Dimension”-based time series functionality that comes with 10.1.3.4. In the end though I managed to get it all up and running, with a new physical model, business model and subject area within my repository so that I could now start recording and analyzing usage tracking information
Usage2-3
Once you get the repository up and running, and the web catalog imported, you can take a look at the pre-defined reports. Here’s one that lists out the top 10 longest running queries.
Usage3-2
You can also of course create your own reports, and add them to a Usage Tracking dashboard, which you can use to monitor the health of your system, highlight problem reports, see if response time is deteriorating over time and so on.
So how does this compare to the log file analysis that I blogged about the other day? Well, putting my Oracle database hat on, analyzing queries via the log file is more like performing an SQL trace, whereas the usage tracking feature is more like using Statspack, or an automated session logging tool such as AWR or ADDM. The NQSQUERY.LOG file gives you more information than usage tracking, including the actual physical SQL or MDX sent to the underlying databas, details of how federated queries are broken down into individual database calls, plus you get the logical execution plan for a query together with individual row counts at the various stages of the query. Like SQL tracing though, you need to have turned logging (or tracing) on before you can analyze a query, and as Oracle’s recommendation is only to turn logging on for diagnostic purposes, it’s likely that it won’t be on when a user calls you up and tells you that the system is slow. Usage Tracking, in contrast, like Statspack or AWR/ASH is likely to be on at all times, and whilst it won’t tell you the details of how the query executed, you can use to it generate general trend-type reports in order to spot when a problem is developing based on deteriorating performance. The Query log appeals to me as I like to look at the internals, but for most people the extra information available beyond logging level 3 (the logical execution plan and the intermediate row counts) won’t be relevant and they’re mostly interested in whether query performance has spiked recently.
So, from a diagnostics and problem highlighting point of view, my recommendation would be to have Usage Tracking turned on and logging query performance as a background task, potentially using an iBot to send an alert if key queries start to come back with below-average response times. Then, if you’re looking to investigate a particular query, or perhaps if you’re interested in how the BI Server handles a particularly complex federated or other query, the query log is a useful equivalent of the Oracle database SQL trace file and gives you the ability to see how the BI Server goes about satisfying a user query

Exploring the New OBIEE 10.1.3.4 Samples Sales Application & Data

One of the new things that shipped with the OBIEE 10.1.3.4 release was a new set of sample data, and a set of reports and dashboards that show off the more advanced features of OBIEE. If you download and install OBIEE now, this is the sample set of data and reports that you get instead of the “Paint Dashboard”, and the idea is that you can use it to explore, and understand, how to make use of some of the more advanced features of the tool. In this blog posting I’ll take a look at the new dashboards and reports, and the data that underlies it. If you’ve not got 10.1.3.4 downloaded, you can access the sample data and reports from this zip file available on OTN.
The new Samples Sales dashboard has four dashboards, containing into total around 20 pages (i.e. tabs) that show off different report features and calculation types. The first dashboard provides an overview of the different pages and reports that are available on the other three dashboards.
Page0
The report descriptions have links next to them that open the individual dashboard pages, with the description being held within a text box and entered as HTML markup.
Clicking on the first of the dashboards, “Rankers and Toppers”, brings up a selection of dashboard pages that illustrate the use of ranking and TopN calculation. Taking a look at the first of the pages “Multi Dim Top N”, this page provides insight into top performers from two dimensions (time and customer) for a single measure. The filters at the top, looking from the left to the right, set a presentation server variable that sets the TopN cut-off, set the year and quarter for the time dimension, and set the brand, region, customer and team filters for the customer dimension.
Page1
Under the first set of graphs is a cross-tab showing “hot spots”, allowing you to spot which combination of values for these two dimension have particularly high values.
Page2
Taking a look at the dashboard page layout, it consists of a number of sections at the top to add the various dashboard prompt filters, then all of the reports are actually contained in the same request. If you take a look at the request, it contains multiple views of the same data, organized as the two sets of graphs and crosstabs (arranged as tables) at the top, and the main crosstab at the bottom. An interesting way to lay out a dashboard page.
Page3
The dashboard page contains a filter at the top, that sets a presentation variable called Top_N_Limit. This presentation variable is then accessed in the various report calculations, to only return results up to the threshold specified by the user.
Page4
The Multi-Metrics Proportional Top N’s dashboard certainly crams a lot of graphs onto the page. I guess the only criticism I have of these new demos is that they’re shrunk the font size to fit in so much data, which makes dashboards quite hard to read now (the same font size is then used for your regular dashboards); also, as the Paint Dashboard is only available now if you specifically switch to it, we’ve lost a pretty “easy on the eye” demo dashboard that was always useful for showing to non-technical, non-statistical users, the demos that now ship with 10.1.3.4 are pretty complex and not something you’d just show casually to business users. But anyway, I digress, here’s the dashboard I was talking about.
Page5
Moving on to the History and Benching dashboard, this shows a number of reports that benchmark one value against another, say taking one product and benchmarking others against it. There’s some nice examples of graphs here, the ones below are designed to help you understand the behavior of a given measure over time, breaking information down by year, quarter, month and week.
Page6
The benchmark report lets you take a particular product and benchmark the others against it, highlighting the benchmarked product in the displayed graphs.
Page7
The rest of the dashboards go on in a similar way, and there’s actually a set of PowerPoint slides you can download from OTN that goes into the details of each dashboard, each page, each report and the way that the calculations and displays are set up. If you’re interested in doing some more advanced reporting, or seeing some “best practice” examples of how reports can be set up, it’s worth downloading the slides (part onepart two) and taking a look.
Page8
Here’s a page describing how the some of the graphs have been devised.
Page9
Apart from the reports and the ranking/topping/bottoming etc calculations, there’s also value in taking a look at the business model and mapping, and presentation models that the new demo data provides. The business model itself has a lot more detail than the old Paint dataset.
Page10
From taking an initial look at the physical model, it shows the use of aliases for all physical tables (Oracle best practice), and how you can use aliases and joins to create tables that contain offset and time-series calculations. Looking at the business model and mapping layer, you can see the various dimension tables, the dimension drill paths, and a number of fact tables that have been set up to hold different classes of calculations. The presentation model layer takes this one further, adding divider “dummy” tables in to separate facts and dimensions (shades of Discoverer here), different icons for different types of tables and so on. If you’ve seen the Oracle BI Applications repository layer, this is where you’ll see a lot of these ideas used in real-life.
The repository also shows off a number of data modeling concepts. The bulk of the model itself is sourced from two separate sources, containing actual and forecast data respectively like this:
Page11
The data then being combined through the use of logical table sources in the business model and mapping layer, so that the federated query feature in the BI Server brings the two data sets together at query run time.
Page13
There’s an example of drill chaining, where drilling down a hierarchy drills across to a different hierarchy level, rather than the one in your own hierarchy that you’re just above.
Page12
There’s also instructions in the PPT file on how to generate a metadata dictionary from the repository file, and then link it to the various report elements so that users can view detailed descriptions on the data in their reports.
Page14
Anyway, as I said, all this ships with the 10.1.3.4 release of OBIEE, with the Sample Sales guidelines and explanations available as a PPT file downloadable from OTN. If you’re interested in working your way through a few more complex reports and metadata setups, it might be worth you taking a look.

OBIEE Repository Variables, Filtering and Conditional Formatting

I’m currently sitting in Galway airport coming back from a few days work, and as I’ve got an hour or so free I thought I’d take a look at server variables in OBIEE. Repository and Session server variables are something you tend to encounter first when working with LDAP security, but they can also be used to create dynamic row-level filters, provide the filter values for conditional formatting and many other tasks, and it’s worth taking a few minutes to understand how they work.
Variables in the Oracle BI Server can be either repository or session variables, and are defined using the Variable Manager within the Oracle BI Administrator (Manage > Variables).
Variable1
This brings up the Variable Manager, that divides variables out into repository and session variables, with a further subdivision into static and dynamic repository ones and system and non-system session ones. You can also define variables at the Presentation Server level, these are scoped to individual dashboards and are used amongst other things for setting dashboard prompt values and integrating the dashboard with BI Publisher parameters. Here’s a screenshot of a typical Variable Manager screen.
Variable2
So what are server (repository) variables used for? Well if you’ve done any programming, either in languages like Visual Basic or on the web, variables these platforms use are the same as the ones that the BI Server uses. As with web development, you can either have server variables that hold the same value across all sessions (in BI Server terms, a “repository” variable), or variables that hold a value specific to each user session (“session” variables in the BI Server). Furthermore, repository variables can either be constants (“static repository” variables) or can have values that vary over time (“dynamic repository” variables), with the dynamic ones refreshed to a schedule via SQL calls and session variables usually set at the time of logon by reference to an LDAP server or an SQL statement. So now that’s all clear, in what sort of situation would they be used? Here’s a scenario that makes use of all these types of BI Server variable.
We have a reporting requirement where salespeople are responsible for a number of states, and these states change on a constant basis. Some of these states are shared with other salespeople and it’s not really practical to define set groupings of these states, hence we have a lookup table on our database that lists out the states each salesperson is responsible for, together with a marker against one particular state that is their home state. We want to use these lists of states as dynamic filters on the reports our salespeople run, and have reports default to their home state when they view their data in a dashboard. We also want to only show data for the current month in their reports, and hold the state in which our headquarters is based in a constant so that we can highlight the sales that are made near to it. In other words, we’ve got a good example of where variables in various shapes and forms can be used to make this process a whole lot easier to implement.
To start off, I use the Variable Manager to define a new static repository variable that I call HEAD_OFFICE_STATE and set to the value ‘NY’. This initial value, that stays constant for static repository variables, is called the “default initializer” and can either be typed in or set via an expression. The only way this variable value can change is if I go back into the Variable Manager and change it there.
Variable3
Next I create the second repository variable, this time a dynamic one, that is used to return the current month name and year to any query that requires it. Unlike static repository variables this variable’s values change over time, using an SQL statement executed to a schedule to update its values.
Variable4
As well as having a default initializer, dynamic variables get their values from initialization blocks that provide the mechanism to execute the SQL via a schedule. To define the initialization block I press “New” and enter the SQL, refresh frequency and variable target for the block. Notice in this example that I’m running the SQL against an Oracle database connection pool, and the truncated month and year name is generated through some Oracle function names and a reference to sysdate.
Variable5
So now I’ve got my two repository variables defined. Before I create my my session variable, I first have to locate a table in my database that lists out the selection of states that each salesperson is working with. Notice how there’s sometimes more than one row per salesperson.
Variable6
I now move on to creating the session variable. To use the data from the table above to provide the values for my dynamic filter, I first of all make sure that this table is accessible through a connection pool in the BI Server physical layer, and then go back to the Variable Manager to define my session variable. Now if this variable was going to hold a single, scalar value, I could define it as a regular non-system session variable, but as it need to hold data from more than one database row, I instead define it using an Initialization Block and row-wise variable initialization, which defines one or more variables based on a select statement, dynamically creating the variables as needed and concatenating the results of multiple rows into the variable.
To perform this task I first define a the new Initialization Block and call it STATE_SECURITY. I then define my SELECT statement as the following:
select 'STATE_FILTER',state from variable_example.salesperson
where salesperson_name = ':USER'
This will return one or more rows with STATE_FILTER, the new dynamically-created variable name, as the first column, and the set of allowed salesman states as the second value, filtered on the logged-in user ID. My initialization block now looks like this:
Variable7
To assign the results of this SELECT statement to a variable, the STATE_FILTER variable, I then click on the “Edit Data Target” button and select Row-Wise Initialization, rather than try and assign the results to individual variables. The initialization block will then take all the rows that have STATE_FILTER as the first column and create a single concatenated, comma-separated list out of the second column, so that I can then apply this value to a filter.
Variable8
Now that the session variable definition is complete, I move over to the Security Manager application, create a group for all of my report users and then define a filter for that group against the SALES table, the one I want to restrict access to. The filter references the session variable I just created, using an equality (“=”) operator rather than the “in” operator you’d have expected, this is a requirement for row-wise variables and OBIEE handles the translation properly in the background.
Variable9
Now if one of the affected users logs in an runs a report against that table, the results are filtered down without any intervention on their part.
Variable10
Conditionally formatting the State column based on whether the value each row contains is also fairly straightforward. When you create a conditional format condition you can specify that that value tested against is a variable; to access one of the repository variables you put biServer.variables[''] around the variable name so that, in my example, the variable name becomes biServer.variables['HEAD_OFFICE_STATE'].
Variable11
Displaying the report now shows all instances of “NY” highlighted in red, based on my conditional formatting rule and the value of the HOME_OFFICE_STATE static repository variable.
Variable12
So there you have it. Moving on from here, filtering the report again based on the dynamic repository variable is just a case of referencing the CURRENT_MONTH variable in my filter, and adding another session variable to hold the salesperson’s home state involves creating another initialization block that this time provides a value for a regular (i.e. not row-wise) HOME_STATE session variable.

Digging into the Oracle BI Server Query Log File

As a follow-up to the posting I did last week on Oracle BI Enterprise Edition performance tuning, I thought I’d take a closer look a the query log file generated by the Oracle BI Server, the “virtual database engine” that ships with Oracle BI EE. For some time now I’ve been looking to see if there is any diagnostic data produced by the BI Server so that, for example, I can see the query plans put together when a query comes in, particularly in the situation where, for example, the BI Server joins together (or “stitches”, in Oracle BI Server terminology) data from two different database queries into one logical dataset. So, let’s take a simple example, where we have a fact table containing 14m rows of sales data and a product table containing 72 rows of product data, with the data all being stored in a single Oracle database and a single physical model describing the source tables and their relationships.
Running a simple query like this, against the source data:
Obiee Log1
Would generate the following log entries, at
$ORACLEBI_HOME/server/Log/NQQuery.log, with query logging set to 3:
##############################################

-------------------- SQL Request:

SET VARIABLE QUERY_SRC_CD='Report';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" ORDER BY saw_0

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24

-------------------- General Query Info:

Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24

-------------------- Logical Request (before navigation):

RqList
    PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
    QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
OrderBy: c1 asc

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24

-------------------- Sending query to database named ora11g (id: <<11172>>):

select T6303.PROD_SUBCATEGORY_DESC as c1,
     sum(T6826.QUANTITY_SOLD) as c2
from
     SH_COPY.PRODUCTS T6303,
     SH_COPY.SALES_BIG T6826
where  ( T6303.PROD_ID = T6826.PROD_ID )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Query Status: Successful Completion

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Rows 21, bytes 84504 retrieved from database query id: <<11172>>

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Physical query response time 194 (seconds), id <<11172>>

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 194, DB-connect time 0 (seconds)

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Rows returned to Client 21

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Logical Query Summary Stats: Elapsed time 195, Response time 194, Compilation time 0 (seconds)
So let’s step through this log entry and look at each part in detail.
The first section sets out the general details of the query, and contains details of the QUERY_SRC_CD variable containing your query, the name of the Oracle BI Server repository and subject area used for the query.
##############################################
-------------------- SQL Request:
SET VARIABLE QUERY_SRC_CD='Report';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" ORDER BY saw_0

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24

-------------------- General Query Info:
Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
Notice that each entry is the log is also timestamped with the username of the BI Server user that was responsible for the query, and the timestamp for the log entry.
Next is the logical, “Request List” that sets out the columns that form the criteria for the report, and the grouping and aggregation applied to the data.
-------------------- Logical Request (before navigation):

RqList
    PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
    QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
OrderBy: c1 asc

+++Administrator:2c0000:2c0003:----2008/10/02 13:29:24
If my report criteria included a filter, the filter columns would appear in the request list as well, like this:
-------------------- Logical Request (before navigation):

RqList
    PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
    QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
DetailFilter: PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c1 asc

+++Administrator:2c0000:2c0005:----2008/10/02 13:42:09
So at this point we’re dealing with the logical view of the data, which relates back to the Presentation Layer of the Oracle BI Server metadata layer. The next part of the log file shows the physical SQL that was generated to correspond with the logical request list of columns specified by the user. In this case, both logicals tables have logical table sources that come from the same physical Oracle database, and so a single physical SQL query is generated.
-------------------- Sending query to database named ora11g (id: <<11172>>):

select T6303.PROD_SUBCATEGORY_DESC as c1,
     sum(T6826.QUANTITY_SOLD) as c2
from
     SH_COPY.PRODUCTS T6303,
     SH_COPY.SALES_BIG T6826
where  ( T6303.PROD_ID = T6826.PROD_ID )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39
Notice the fact that Oracle BI Server automatically aggregates your data (this is behavoir specified in the default aggregation settings in the BI Server metadata layer, you can specify non-grouping, i.e. detail display of data instead), and applies a GROUP BY on the dimension data.
The next part of the log file details whether the query completed successfully, how long it took to return data, how many rows were returned and the response time of the underlying database.
-------------------- Query Status: Successful Completion

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Rows 21, bytes 84504 retrieved from database query id: <<11172>>

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Physical query response time 194 (seconds), id <<11172>>

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 194, DB-connect time 0 (seconds)

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Rows returned to Client 21

+++Administrator:2c0000:2c0003:----2008/10/02 13:32:39

-------------------- Logical Query Summary Stats: Elapsed time 195, Response time 194, Compilation time 0 (seconds)
So what we’ve got here is a fairly straightforward record of the logical query requested by the user (the request list of columns, the criteria for your request), the physical SQL generated to return the required list of columns, any filters, aggregation or grouping applied to the data, and statistics on how long the query took to run. Putting my Oracle Database hat on, we haven’t got anything equivalent to the “wait events” you get when tuning the Oracle database, where it tells you where the time has been spent when executing a query (waiting for disk I/O, waiting for network I/O, etc) but then again, in this case it’s the database that’s doing the vast majority of the work and there’s not really much overhead added by the BI Server here. The database took 194 seconds to return data, and the BI server only added a second to this, giving a total response time of 195.
There are actually several levels of logging that you can turn on for users and queries, with levels going from 0 (no logging) to 5 (lots and lots of logging), with the general advice being to only really turn this on when you’re debugging a query, as the logs are written to disk and can slow down a real-life system if left on. In a bit more detail, the logging levels currently available are as follows:
Level 0 = no logging, the default setting for a user or query
Level 1 = logs the logical SQL (the request list), the elapsed time, whether the cache was used, query status, user name and so on.
Level 2 = Everything in level 1, plus repository name, subject area name, physical SQL generated, number of rows returned and so on
Level 3 = Everything in level 2 plus some additional data on cache usage. This is the level I usually set, though the docs suggest you should just use level 2
Level 4 = Everything in level 3 plus the physical SQL execution plan
Level 5 = Everything in level 4 plus intermediate row counts at various stages, particularly useful when the BI Server is stitching data sets together.
So how does the above query look when run at level 5 logging? Let’s take a look, this time with the filter added in that I showed a minute ago. I’ve bolded the areas that are new in this level of logging:
-------------------- SQL Request:

SET VARIABLE QUERY_SRC_CD='Report',SAW_SRC_PATH='/users/administrator/Simple Log Test Query';SELECT PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC saw_0, SALES_BIG_LOCAL.QUANTITY_SOLD saw_1 FROM "Log & Federated Query Test" WHERE PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC IN ('Camcorders', 'Camera Batteries', 'Camera Media') ORDER BY saw_0

+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10

-------------------- General Query Info:

Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test

+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10

-------------------- Logical Request (before navigation):

RqList
    PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC as c1 GB,
    QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC] )] as c2 GB
DetailFilter: PRODUCTS_LOCAL.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c1 asc

+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10

-------------------- Execution plan:

RqList <<9790>> [for database 3023:2350:ora11g,44]
    PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2350,44],
    sum(SALES_BIG.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9851>> [for database 3023:2350:ora11g,44]
    PRODUCTS T6303
    SALES_BIG T6826
DetailFilter: PRODUCTS.PROD_ID = SALES_BIG.PROD_ID and (PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media') [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2350,44]
OrderBy: c1 asc [for database 3023:2350,44]

+++Administrator:2b0000:2b0001:----2008/10/02 14:04:10

-------------------- Sending query to database named ora11g (id: <<9790>>):

select T6303.PROD_SUBCATEGORY_DESC as c1,
     sum(T6826.QUANTITY_SOLD) as c2
from
     SH_COPY.PRODUCTS T6303,
     SH_COPY.SALES_BIG T6826
where  ( T6303.PROD_ID = T6826.PROD_ID and (T6303.PROD_SUBCATEGORY_DESC in ('Camcorders', 'Camera Batteries', 'Camera Media')) )
group by T6303.PROD_SUBCATEGORY_DESC
order by c1

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Execution Node: <<9790>>, Close Row Count = 3, Row Width = 4024 bytes

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Execution Node: <<9790>> DbGateway Exchange, Close Row Count = 3, Row Width = 4024 bytes

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Execution Node: <<9790>> DbGateway Exchange, Close Row Count = 3, Row Width = 4024 bytes

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Query Status: Successful Completion

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Rows 3, bytes 12072 retrieved from database query id: <<9790>>

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Physical query response time 195 (seconds), id <<9790>>

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 195, DB-connect time 0 (seconds)

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Rows returned to Client 3

+++Administrator:2b0000:2b0001:----2008/10/02 14:07:25

-------------------- Logical Query Summary Stats: Elapsed time 195, Response time 195, Compilation time 0 (seconds)
So the execution plan that the log file contains is the logical execution plan, not the Oracle execution plan corresponding to the physical SQL that is generated. Let’s take a look at this logical execution plan in more details.
-------------------- Execution plan:

RqList <<9790>> [for database 3023:2350:ora11g,44]
    PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2350,44],
    sum(SALES_BIG.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2350,44]
Child Nodes (RqJoinSpec): <<9851>> [for database 3023:2350:ora11g,44]
    PRODUCTS T6303
    SALES_BIG T6826
DetailFilter: PRODUCTS.PROD_ID = SALES_BIG.PROD_ID and (PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media') [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC]  [for database 3023:2350,44]
OrderBy: c1 asc [for database 3023:2350,44]
So what this contains is the request list as before, and then a list of “child nodes” with each node corresponding to a step in the execution plan (I think). Then we’ve got the filter applied to the nodes, the group by and the order by, and the query again takes 195 seconds, with the BI Server adding virtually nothing to the response time.
So far it all looks straightforward, but what happens if instead of both logical table sources pointing to the same one physical database, we source the dimension table from a second physical database, so that the BI Server generates a federated query and stitches the result set together in the BI Server’s memory. How would this extended query log look then? Let’s take a look.
-------------------- SQL Request:

SET VARIABLE QUERY_SRC_CD='Report';SELECT SALES_BIG_LOCAL.QUANTITY_SOLD saw_0, PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC saw_1 FROM "Log & Federated Query Test" WHERE PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC IN ('Camcorders', 'Camera Batteries', 'Camera Media') ORDER BY saw_1

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- General Query Info:

Repository: Star, Subject Area: Log & Federated Query Test, Presentation: Log & Federated Query Test

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- Logical Request (before navigation):

RqList
    QUANTITY_SOLD:[DAggr(SALES_BIG_LOCAL.QUANTITY_SOLD by [ PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC] )] as c1 GB,
    PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC as c2 GB
DetailFilter: PRODUCTS_REMOTE.PROD_SUBCATEGORY_DESC in ([ 'Camcorders', 'Camera Batteries', 'Camera Media'] )
OrderBy: c2 asc

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- Execution plan:

RqBreakFilter <<9948>>[2] [for database 0:0,0]
    RqList <<10056>> [for database 0:0,0]
        sum(D1.c5 by [ D1.c2]  at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
        D1.c2 as c2 [for database 3023:6422,44]
    Child Nodes (RqJoinSpec): <<10068>> [for database 0:0,0]
        (
            RqList <<9865>> [for database 0:0,0]
                D902.c1 as c2 GB [for database 3023:6422,44],
                D901.c2 as c3 [for database 3023:2350,44],
                D901.c3 as c5 [for database 3023:2350,44]
            Child Nodes (RqJoinSpec): <<9950>> [for database 0:0,0]
                    (
                        RqList <<9964>> [for database 3023:2350:ora11g,44]
                            SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
                            sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
                        Child Nodes (RqJoinSpec): <<9967>> [for database 3023:2350:ora11g,44]
                            SALES_BIG T6826
                        GroupBy: [ SALES_BIG.PROD_ID]  [for database 3023:2350,44]
                        OrderBy: c2 asc [for database 3023:2350,44]
                    ) as D901
                InnerJoin <<10001>> On D901.c2 = D902.c2; actual join vectors:  [ 0 ] =  [ 1 ]
                    (
                        RqList <<9981>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
                            PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
                        Child Nodes (RqJoinSpec): <<9984>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS T6864
                        DetailFilter: PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media' [for database 0:0]
                        OrderBy: c2 asc [for database 3023:6422,44]
                    ) as D902
            OrderBy: c2, c3 [for database 0:0,0]
        ) as D1
    OrderBy: c2 asc [for database 0:0,0]

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- Sending query to database named ora11g (id: <<9964>>):

select T6826.PROD_ID as c2,
     sum(T6826.QUANTITY_SOLD) as c3
from
     SH_COPY.SALES_BIG T6826
group by T6826.PROD_ID
order by c2

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- Sending query to database named remote_ora11g (id: <<9981>>):

select T6864.PROD_SUBCATEGORY_DESC as c1,
     T6864.PROD_ID as c2
from
     PRODUCTS T6864
where  ( T6864.PROD_SUBCATEGORY_DESC in ('Camcorders', 'Camera Batteries', 'Camera Media') )
order by c2

+++Administrator:2a0000:2a0002:----2008/10/02 15:22:38

-------------------- Execution Node: <<9981>>, Close Row Count = 9, Row Width = 4012 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9964>>, Close Row Count = 72, Row Width = 24 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Cancel initiated for Exchange Producer: <<9964>> DbGateway Exchange

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9964>> DbGateway Exchange, Close Row Count = 63, Row Width = 24 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9981>> DbGateway Exchange, Close Row Count = 9, Row Width = 4012 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10001>> Binary Join, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9865>> Projection, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9865>> Sort, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Pre-aggr Projection, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Aggregation, Close Row Count = 9, Row Width = 4040 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Post-aggr Projection, Close Row Count = 9, Row Width = 4016 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Sort, Close Row Count = 9, Row Width = 4016 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9948>> GroupByNoSort, Close Row Count = 3, Row Width = 4016 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9964>> DbGateway Exchange, Close Row Count = 63, Row Width = 24 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9981>> DbGateway Exchange, Close Row Count = 9, Row Width = 4012 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Query Status: Successful Completion

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Rows 72, bytes 1728 retrieved from database query id: <<9964>>

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Physical query response time 172 (seconds), id <<9964>>

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Rows 9, bytes 36108 retrieved from database query id: <<9981>>

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Physical query response time 0 (seconds), id <<9981>>

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Physical Query Summary Stats: Number of physical queries 2, Cumulative time 172, DB-connect time 0 (seconds)

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Rows returned to Client 3

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Logical Query Summary Stats: Elapsed time 173, Response time 172, Compilation time 0 (seconds)
So this looks a bit more interesting, as we can see that the BI Server is doing some of the work now, generating an SQL query for each physical data source and then stitching the data together. There’s also lot more nodes, which suggests that the BI Server is going through a lot more steps to return the data. Let’s a take a look at the more interesting parts of the log, starting with the logical execution plan, which now contains a number of child nodes; ones for each of the two database queries, further ones for the stitched-together data set and a node for the join between the two result sets .
-------------------- Execution plan:

RqBreakFilter <<9948>>[2] [for database 0:0,0]
    RqList <<10056>> [for database 0:0,0]
        sum(D1.c5 by [ D1.c2]  at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
        D1.c2 as c2 [for database 3023:6422,44]
    Child Nodes (RqJoinSpec): <<10068>> [for database 0:0,0]
        (
            RqList <<9865>> [for database 0:0,0]
                D902.c1 as c2 GB [for database 3023:6422,44],
                D901.c2 as c3 [for database 3023:2350,44],
                D901.c3 as c5 [for database 3023:2350,44]
            Child Nodes (RqJoinSpec): <<9950>> [for database 0:0,0]
                    (
                        RqList <<9964>> [for database 3023:2350:ora11g,44]
                            SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
                            sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
                        Child Nodes (RqJoinSpec): <<9967>> [for database 3023:2350:ora11g,44]
                            SALES_BIG T6826
                        GroupBy: [ SALES_BIG.PROD_ID]  [for database 3023:2350,44]
                        OrderBy: c2 asc [for database 3023:2350,44]
                    ) as D901
                InnerJoin <<10001>> On D901.c2 = D902.c2; actual join vectors:  [ 0 ] =  [ 1 ]
                    (
                        RqList <<9981>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
                            PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
                        Child Nodes (RqJoinSpec): <<9984>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS T6864
                        DetailFilter: PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camcorders' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Batteries' or PRODUCTS.PROD_SUBCATEGORY_DESC = 'Camera Media' [for database 0:0]
                        OrderBy: c2 asc [for database 3023:6422,44]
                    ) as D902
            OrderBy: c2, c3 [for database 0:0,0]
        ) as D1
    OrderBy: c2 asc [for database 0:0,0]
Notice that each of the main nodes has IDs assigned to them (9981, 9964 and 9865) so that you can follow them through the rest of the log file.
Later on down the log file we can see the physical SQL generated for each node, and the row count resulting from each physical SQL execution. Then, after the data is retrieved from the various physical database connection, you can see how the BI Server then goes on to inner join the data (the binary join on node 10001), remove unneeded columns from the initial result set (projection), sort the data, again remove unneeded columns and then aggregate the data. You can also see what operations are carried out on the data before and after aggregation, where sorting is carried out and so on.
+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10001>> Binary Join, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9865>> Projection, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9865>> Sort, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Pre-aggr Projection, Close Row Count = 9, Row Width = 4032 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Aggregation, Close Row Count = 9, Row Width = 4040 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Post-aggr Projection, Close Row Count = 9, Row Width = 4016 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<10056>> Sort, Close Row Count = 9, Row Width = 4016 bytes

+++Administrator:2a0000:2a0002:----2008/10/02 15:25:31

-------------------- Execution Node: <<9948>> GroupByNoSort, Close Row Count = 3, Row Width = 4016 bytes
From looking this log file, you can pretty much determine the steps the BI Server is going through to add aggregations, sorting, result set stitching and so on to your data, if you’re interested in seeing how data from multiple results sets is brought together, I don’t know if this is publicly documented anywhere and whether other join types are possible (I’ve heard of nested loop joins also being performed), but this certainly looks interesting if you’re into Oracle BI Server internals. Something I’m certainly looking to do is to document these steps properly and come up with the complete set of steps that can be carried out, if anyone’s interested in helping then let me know as I said, I don’t think this is publicly documented.
So looking at the log file output, this query has actually taken just 173 seconds rather than the 195 required for the single-source query, the BI Server only adding a second to this. I wouldn’t place too much emphasis on the actual times taken for each variation on the query as I’m only running them once and on a single laptop, but you can see from the timings that Oracle BI Server doesn’t seem to add much overhead, even when you’re using the federated query feature (at least on my dataset, which isn’t all that big).
Now one of the things you can do in the BI Server metadata layer, is select the “driving table” for these types of cross-database joins. This is particularly useful if one of your tables is small (the suggested limit is 1000 rows, in my case it’s 70, and it tells the BI Server to firstly query the driving table to return a small amount of interesting rows, then it applies these these rows to the other table in batches so that only a much smaller set of the big table rows are then loaded into the BI Server memory.
Obiee Log2
The rationale here is that a normal federated query join would require both tables to be loaded into the BI Server memory, and if one of them is big, then this will require a full table scan of the physical source table and potentially take up a lot of memory in the BI Server. Using this driving table technique you can fairly quickly come up with a much smaller set of initial rows and row ids, this can then be used to filter the second table and therefore reduce the I/O usage and memory usage required by the federated query join. There are a couple of parameters you can set in the NQSCONFIG.INI file to determine the size of the batches that are applied to the larger table, and you can switch this feature off completely, but how does this look in the query log file, and does it make my query run faster? Let’s take a look.
The first difference in this “driving table” query log entry is the logical execution plan, than now retrieves data from the second, non-driving table via a set of parameter values.
RqBreakFilter <<9961>>[2] [for database 0:0,0]
    RqList <<10027>> [for database 0:0,0]
        sum(D1.c5 by [ D1.c2]  at_distinct [ D1.c2, D1.c3] ) as c1 [for database 0:0,0],
        D1.c2 as c2 [for database 3023:6422,44]
    Child Nodes (RqJoinSpec): <<10039>> [for database 0:0,0]
        (
            RqList <<9918>> [for database 0:0,0]
                D901.c1 as c2 GB [for database 3023:6422,44],
                D902.c2 as c3 [for database 3023:2350,44],
                D902.c3 as c5 [for database 3023:2350,44]
            Child Nodes (RqJoinSpec): <<9963>> [for database 0:0,0]
                    (
                        RqList <<9966>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:6422,44],
                            PRODUCTS.PROD_ID as c2 [for database 3023:6422,44]
                        Child Nodes (RqJoinSpec): <<9969>> [for database 3023:6422:remote_ora11g,44]
                            PRODUCTS T6864
                        OrderBy: c2 asc [for database 3023:6422,44]
                    ) as D901
                InnerJoin  (left drive) <<10000>> On D902.c2 = D901.c2; actual join vectors:  [ 1 ] =  [ 0 ]
                    (
                        RqList <<9984>> [for database 3023:2350:ora11g,44]
                            SALES_BIG.PROD_ID as c2 [for database 3023:2350,44],
                            sum(SALES_BIG.QUANTITY_SOLD by [ SALES_BIG.PROD_ID] ) as c3 [for database 3023:2350,44]
                        Child Nodes (RqJoinSpec): <<9987>> [for database 3023:2350:ora11g,44]
                            SALES_BIG T6826
                        DetailFilter: SALES_BIG.PROD_ID = ?1 or SALES_BIG.PROD_ID = ?2 or SALES_BIG.PROD_ID = ?3 or SALES_BIG.PROD_ID = ?4 or SALES_BIG.PROD_ID = ?5 or SALES_BIG.PROD_ID = ?6 or SALES_BIG.PROD_ID = ?7 or SALES_BIG.PROD_ID = ?8 or SALES_BIG.PROD_ID = ?9 or SALES_BIG.PROD_ID = ?10 or SALES_BIG.PROD_ID = ?11 or SALES_BIG.PROD_ID = ?12 or SALES_BIG.PROD_ID = ?13 or SALES_BIG.PROD_ID = ?14 or SALES_BIG.PROD_ID = ?15 or SALES_BIG.PROD_ID = ?16 or SALES_BIG.PROD_ID = ?17 or SALES_BIG.PROD_ID = ?18 or SALES_BIG.PROD_ID = ?19 or SALES_BIG.PROD_ID = ?20 [for database 0:0]
                        GroupBy: [ SALES_BIG.PROD_ID]  [for database 3023:2350,44]
                        OrderBy: c2 asc [for database 3023:2350,44]
                    ) as D902
            OrderBy: c2, c3 [for database 0:0,0]
        ) as D1
    OrderBy: c2 asc [for database 0:0,0]

+++Administrator:2a0000:2a0004:----2008/10/02 15:39:08
You can specify the maximum amount of parameters in the table filter through the MAX_PARAMETERS_PER_DRIVE_JOIN parameter in the NQSCONFIG.INI file, with the limit really being down to how many filter values the back-end database can handle. The MAX_QUERIES_PER_DRIVE_JOIN parameter thereafter limits the total number of separate calls to the non-driving table that can take place (a function of the total number of rows returned by the initial driving table query divided by the MAX_PARAMETERS_PER_DRIVE_JOIN parameter setting), to stop run-away queries occuring that generate hundreds of individual calls to the non-driving table.
The other interesting part is the actual physical SQL sent to the non-driving table, which looks like this:
-------------------- Sending query to database named ora11g (id: <<9984>>):

select T6826.PROD_ID as c2,
     sum(T6826.QUANTITY_SOLD) as c3
from
     SH_COPY.SALES_BIG T6826
where  ( T6826.PROD_ID in (:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, :PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, :PARAM18, :PARAM19, :PARAM20) )
group by T6826.PROD_ID
order by c2

+++Administrator:2a0000:2a0004:----2008/10/02 15:39:08
So in this case, the driving table feature will initially run the query against the smaller (dimension) table and return the rows specified by the filter on the query. Then, in batches of twenty, individual selects will be made against the larger, non-driving table, until all the interested rows are returned. These will then be joined in memory in the BI Server and the results returned to the dashboard, in theory faster than if we’d tried to load both tables into the BI Server memory and join them both there.
So now, to take this one step further, what does the query log file look like if we run the Aggregate Persistence Wizard on this data set, generate and aggregate at the PROD_SUBCATEGORY_DESC, plug the aggregates into the logical model and then run a query? Where (if anywhere) does aggregate navigation future in the logical execution plan and query steps?
Well, after running the wizard and registering the aggregates, I run the query again and here’s the output:
-------------------- Execution plan:

RqBreakFilter <<10079>>[2] [for database 0:0,0]
    RqList <<10037>> [for database 3023:2350:ora11g,44]
        ag_SALES_BIG_LOCAL.QUANTITY_S0001562C as c1 GB [for database 3023:2350,44],
        SA_Subcate0001593D.PROD_SUBCA0001561B as c2 GB [for database 3023:2350,44]
    Child Nodes (RqJoinSpec): <<10067>> [for database 3023:2350:ora11g,44]
        SA_Subcate0001593D T7688
        ag_SALES_BIG_LOCAL T7695
    DetailFilter: SA_Subcate0001593D.PROD_SUBCA0001561B = ag_SALES_BIG_LOCAL.PROD_SUBCA0001561B [for database 0:0]
    OrderBy: c2 asc [for database 3023:2350,44]

+++Administrator:2a0000:2a0005:----2008/10/02 16:46:18

-------------------- Sending query to database named ora11g (id: <<10037>>):

select distinct T7695.QUANTITY_S0001562C as c1,
     T7688.PROD_SUBCA0001561B as c2
from
     SH_COPY.SA_Subcate0001593D T7688,
     SH_COPY.ag_SALES_BIG_LOCAL T7695
where  ( T7688.PROD_SUBCA0001561B = T7695.PROD_SUBCA0001561B )
order by c2
So, you can see the aggregate table being used, but there’s nothing in the log to say that aggregates were considered, the query is just generated against the logical table source that returns the smallest amount of rows.
Anyway, that’s a quick trip around the query log file, with a short diversion into driving tables and aggregates. Of course the $64,000 questions is “what do you do with the log file, in particular the knowledge of the various query execution steps”, and in reality if you’re getting to the point where federated queries are running slowly, your most obvious and productive means of sorting this out is to physically integrate your data into a single data mart or data warehouse rather than mucking around trying to decipher undocumented log files. Still, if you’re stuck with a federated query or just interested in how the internals work, it’s an interesting set of diagnostics you can generate, I’ll certainly be blogging more on this over time and if you’ve got anything to add, just let me know in the article comments.