Taking small steps to cross the tab

After a long silence, let’s have something positive today: Pentaho Reporting now officially talks to Mondrian and any other OLAP4J datasources.

We now ship with two flavors of MDX access. The existing MDX capabilities are covered by the BandedMDXDataFactory, while the crosstabbing functionality will rely on a new DenormalizedMDXDataFactory.

The BandedMDXDataFactory takes a two dimensional MDX-Query-Result and maps the multidimensional dataset into a flat table. The approach is reasonable if you want to access the cube row-by-row, but it fails badly as soon as your query has more than two dimensions or if your query-result displays a ragged hierarchy. The report-designer used this mode for a very long time to provide at least some access to Mondrian-DataSources. The banded mode is still great if you need banded reporting over MDX datasources.

However, with Version 0.8.11 of the reporting engine, we finally have to provide real crosstabbing capabilities.

At that point, the pre-chewed data provided by the BandedMDXDataFactory is totally unsuitable for anything sophisticated. You cannot reconstruct a cow from a steak. In the same way we cannot use the banded data to reconstruct the axis and hierarchy information provided by the real MDX-ResultSet. At the same time, the complex (and in some points ambiguous) nature of the data-processing that happens inside the BandedMDXDataFactory makes it next to impossible to use plain queries as source for a crosstabbed report.

The goals for our crosstab-implementation are straightforward:

  • It has to work on existing data-sources using only TableModels as input (Don’t over-architect)
  • The internal data-source structures must be simple so that any source-system is capable of providing the data in the correct format. (Don’t exclude anyone.)
  • Provide only simple aggregation as built-in functions (Don’t copy Mondrian.)
  • Make sure that functions and expressions work exactly like in relational reports. (Don’t be special.)

The new denormalized MDX-DataFactory provides a streaming view over the MDX-Cells. Any datasource can provide a similar view by simply joining the fact-table with all dimensions (and by sorting them according to the desired axis structure). The denormalized view now makes it possible to treat MDX-Columns and Rows (and any of the other 253 possible axises) as relational groupings, which just happen to be displayed in a non-banded manor.

Now with the data-problem solved, displaying the data will be quite easy, even for huge result-sets.

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.