MDX from Hell no longer scares me

Now, after the first crosstab is printed, it’s now time to harden both the datasources and the presentation layer code. For that, Julian Hyde, our Master of Multidimensionality, donated a set of MDX-queries that demonstrate the bordercases of MDX-results.

In the old days, the MDX capabilities of the reporting-engine were somewhat limited. If you were nice, the Report-Designer would allow you to get a tabular view out of your OLAP cube. But the Report-Designer would become very nasty if your query left the happy path of the regular 2-dimensional table results.

To really handle MDX, we also have to deal with all the advanced cases too:

  • MDX results can have up to 255 axes. Thats more than the two we supported. Although not many users will utilize more than 2 axes, there are some good cases for them as well. All of these extra axes can be use to do classical grouping, to insert pagebreaks and so on.
  • MDX results can have zero axes. In that case, the result will be a single cell. Again, not likely, not common, but valid. If we want to claim that we eat MDX, we must not choke on that stuff.
    Example: "select from [SteelWheelsSales]"
  • MDX results can be one dimensional. This results in a single row of data, with all measures aggregated for the single axis.
    Example: "select [Product].Children on 0 from [SteelWheelsSales]"
  • And then, axes can be empty. Once a axis is empty, measures are no longer printable. But we still get the dimension data for all other axes.
    Example: "select [Product].parent on 0, [Time].Children on 1 from [SteelWheelsSales]"
    (this ends up empty because the parent of ‘Product.all’ is null, and null members are implicitly filtered)
  • And of course, we can have ragged hierarchies. In ragged hierarchies, the depth of the shown dimensions is not uniform accross all members of an axis.

    Product Time/Year Time/Quarter
    All 2004
    All 2005
    All 2005 Q1

After some tuning, slashing and hacking through the code, we now have a stable set of MDX-capabilities. We now have two access paths with three different result-types:

OLAP datasources can now be accessed either via a native Mondrian datasource (which exists purely for legacy reasons, so that we can preserve existing datasource definitions) and the new OLAP4J-data-factory which uses OLAP4J to speak with the OLAP-backend. This way, we get XML/A and access to all other MDX-servers that support OLAP4J (like Palo) for free.

For each of the access methods, we support three result-types:

  • The legacy-banded mode. It comes with all the limitations and strange quirks of the old version. It may not be pretty, but gives 100% backward compatibility. For that reason, it still dies on most of the special cases and behaves funny on ragged hierarchies. But that’s all just for self-defence: I dont want to be the one who has to explain to all our users that they have to re-edit all their MDX reports with the upgrade of the reporting-engine. The saying “they would kill me for that” might become unpleasently true in that case.
  • The banded mode is a cleaned up and extended version of the old banded-mode. As we dont have to care about backward compatibility here, we were free to fix the strange column naming (“Time”, “Time{2}”, “EMEA/Sales/Measures” and so on) into a more normalized column naming schema (“[Time].[(All)]”, “[Time].[Years]”, “[Markets].[All Markets].[#null]/[Measures].[Quantity]” and so on). The cleaned up mode now also fully supports ragged hierarchies and all the special cases and is a good choice for creating classical banded reports from MDX-datasources.
  • And last but not least: The denormalized mode. This data-factory creates a streaming view over the fact-table, so that we can create crosstabs in the most efficient way.

And as a bonus: All datasources now support metadata feeds. MDX provides methods to add all kind of (computed) attributes to the result cells, and axes-members. Some attributes are predefined by Microsoft (like colors, format-strings etc), but it is perfectly legal to add your own attributes as well. The latest version of the reporting engine can pick these attribute up and use them either as formatting information or as input for other computations. And now, after a long time, queries written for JPivot’s web-view can be copied into the reporting-engine and they look the same on paper (or PDF or Excel) as they did in the browser.

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.