Monthly Archives: June 2008

Taking small steps to cross the tab

After a long silence, let’s have something positive today: Pentaho Reporting now officially talks to Mondrian and any other OLAP4J datasources.

We now ship with two flavors of MDX access. The existing MDX capabilities are covered by the BandedMDXDataFactory, while the crosstabbing functionality will rely on a new DenormalizedMDXDataFactory.

The BandedMDXDataFactory takes a two dimensional MDX-Query-Result and maps the multidimensional dataset into a flat table. The approach is reasonable if you want to access the cube row-by-row, but it fails badly as soon as your query has more than two dimensions or if your query-result displays a ragged hierarchy. The report-designer used this mode for a very long time to provide at least some access to Mondrian-DataSources. The banded mode is still great if you need banded reporting over MDX datasources.

However, with Version 0.8.11 of the reporting engine, we finally have to provide real crosstabbing capabilities.

At that point, the pre-chewed data provided by the BandedMDXDataFactory is totally unsuitable for anything sophisticated. You cannot reconstruct a cow from a steak. In the same way we cannot use the banded data to reconstruct the axis and hierarchy information provided by the real MDX-ResultSet. At the same time, the complex (and in some points ambiguous) nature of the data-processing that happens inside the BandedMDXDataFactory makes it next to impossible to use plain queries as source for a crosstabbed report.

The goals for our crosstab-implementation are straightforward:

  • It has to work on existing data-sources using only TableModels as input (Don’t over-architect)
  • The internal data-source structures must be simple so that any source-system is capable of providing the data in the correct format. (Don’t exclude anyone.)
  • Provide only simple aggregation as built-in functions (Don’t copy Mondrian.)
  • Make sure that functions and expressions work exactly like in relational reports. (Don’t be special.)

The new denormalized MDX-DataFactory provides a streaming view over the MDX-Cells. Any datasource can provide a similar view by simply joining the fact-table with all dimensions (and by sorting them according to the desired axis structure). The denormalized view now makes it possible to treat MDX-Columns and Rows (and any of the other 253 possible axises) as relational groupings, which just happen to be displayed in a non-banded manor.

Now with the data-problem solved, displaying the data will be quite easy, even for huge result-sets.

Not likely to break (Rank 10)

Pentaho Reporting has been ranked #10 in Enerjy’s analysis of open source projects that are not likely to break.

Enerjy performs some static code analysis on the source code and computes several metrics on how well the software is maintained. Thanks to our paranoid programming, which assumes that the users of the code (in most cases: I) are way to stupid to code correctly all the time. Therefore we write the code in a way that breaks early and hard – including strict assertations, strong typing and explict checks for Null-References as soon as we receive parameters on public or protected methods instead of happily assuming that the humans never produce bugs and accepting everything first (hoping that things will continue to go well) and acting surprised if – no: when – later in the process the assumptions have been proven wrong.

Always protect yourself when coding – you never know what diseases you might catch otherwise.

Why OpenSource is the thing that changes the world

During a TED conference back in 2005, Yochai Benkler gave a great talk about why OpenSource and the whole social production schemas we saw emerging during the last years are a revolution as big as the transition of agricultural societies into the industrial age.

We in the OpenSource crowd of course know that our path is leading to a bright future. But if you go out and randomly select one of the many OpenSource community members to explain, why this thing is THE BIG THING, you probably end up drowned in words but you wont be smarter than before. Until now, I haven’t found a explanation as clear and simple as the one given in this 17 minutes talk.

Watch theĀ  recorded conference talk.

Classic-Engine 0.8.10 and beyond

This weekend, we finally released Version 0.8.10 of the Pentaho Reporting Classic Engine. This release is yet another infrastructure release (yes, sounds boring) that prepares the ground for going to 1.0.

Aside from the already covered Unified Fileformat and the full support for all kinds of meta-data, this release also ships with a totally revamped parametrization API, support for Barcodes (great job, Mimil!) and Sparkline support.

The next development cycle will be a shorter one. In the upcoming Version 0.8.11 we will finally add crosstabbing and Pivot-tables, speak with Mondrian datasources, provide a sensible interface for the rich-text capabilities, add free-form subreports and will add a first version of the multi-column support. If everything goes right, this version will enter its Release-Candidate state at the end of the month.

One major change already happened on this version: All engine and library classes contained in org.jfree-packages now have been moved into corresponding org.pentaho-packages. This move was necessary so that we do no longer pollute the org.jfree-namespace. At the same time, it allows us to move the Classic and the Flow-Engine into separate packages, so that they can co-exist in the same Java Virtual Machine.

As usual: Users of the XML fileformats are safe from any changes, the XML report definitions continue to work unchanged. API users will have to migrate their code to the new package space. But as the change involves only moved packages, a update of the import-statements should be the majority of the conversion work.

As the APIs of the libraries seem to be stable and sane now, along with the release of 0.8.11, all libraries will be labeled 1.0 versions.