A Message from the Trenches ..

Over the last 6 weeks I finally found the time to dive into the crosstab related development. Crosstabbing as a data manipulation exercise is a rather easy and straight forward as an algorithm. Printing simple crosstabs without regard for user defined calculations is not hard either – if you are willing to stick to the simple model for eternity. But integrating the crosstabbing code so that the layouting uses our existing capabilities of style- and attribute-expressions, flexible layouts and decent scalability even when processing massive amounts of data – that takes more than a two-weeks prototype hacking.

Step zero on my quest happened earlier this year when I wrote a testing framework to validate the layouter output on a very low level. The idea of this testing framework is based on “golden samples” – known good data-dumps of the layout results. For every change I make, I now can validate if and how the final layout is affected. Sometimes I want to see changes (especially when fixing bugs), but for most parts I want to add new functions without breaking existing reports. The testing framework helps me to detect changes in the layout that I did not foresee.

The framework shields me from the system components, like the local font renderer (a real bugger that changes with every update of the operating system) and from changes in the local font files (Arial on Mac is not the same as Arial on Windows, for instance).

Step one of implementing crosstabs then involved getting a true table layout into the reporting engine. Our reporting engine does is a banded engine in the tradition of the old COBOL and RPG/400 reporting tools. Each band is fairly separate from other bands. On paper you get the illusion of dealing with tables, as normally row after row of data is printed at the same horizontal position. But our primitive model cannot adjust for changes in the column sizes, as once a band is printed it cannot be altered any more.

This is similar to trying to create a table in a word processor by only using tabs. Once you try to put a overly long item into your ‘cell’, the layout goes all wrong, as previous and following rows do not alter their column sizes to match the large item.

But if we use a proper table, with real columns and rows, the problem goes away almost immediately. A table allows us to place elements relative to other elements in other rows and to maintain that relationship as long as we want. For small tables, this may be for the whole data-set, but similar to the “table-layout: fixed” CSS style attribute, we can also define a cut-off point after a certain number of rows and thus balancing the need for keeping the table flexible and the need of not buffering too much layout data.

During a long interlude during the summer time I was busy working on two rather large bug-fix releases (3.8.1 and 3.8.3) with no time whatsoever to do any new development. (I just managed to sneak in a week or two of table-layout related work.). And now, finally, since Mid-September I am back into the layout system. By eliminating all distractions – including writing articles here – I managed to get some private cuddle-up time with the rendering system.

The reporting engine comes from a banded background, and thus our existing layout system was built around the assumption that the world is easily consumable in banded chunks. With the Citrus-rewrite two years ago I opened the layouter to a more flexible world view by recreating the layout system as a CSS/DOM oriented layout system. But without a need for cross-band layouts, I still ended up introducing many assumptions that only work in a banded world.

In the old (3.x) layout system, global structures like groups, subreports and root-level bands are produced by the “Renderer” class. The renderer is the central point of the layout calculations and manages both the creation of the layout nodes and the calculation of the final layout. The contents of the bands themselves are then computed by a class named “DefaultLayoutProducer”. This model fails horribly if the banded layout is just a subset of a larger structure, like a table, for instance. The “DefaultLayoutProducer” is not aware of the outside model, and the “Renderer” does not care what the “DefaultLayoutProducer” does within his own band. I created the model of a completely dysfunctional family here.

With the new system in place, there is only one point that produces layout nodes. The model is no longer a collection of local sub-models but one big model with a global state. That not only simplifies the code, it also opens up a new set of capabilities.

So far, the layout system rewrite is nearly complete. The “golden sample” tests of the engine-core project are running fine on my box now, but some of the integration tests in the “testcases” project still fail. Once they work, I can rewire the layouter to accept global table definitions across groups. I can also finally open up the group layouts to support more layout options, and thus allow to print groups horizontally instead of vertically, or even print the header, group-body and footer side-by-side.

For the next few weeks, I will be back to bug-fixing for the upcoming 3.9.0 release. This release will contain mostly bug-fixes and will be shipped with the BI-Server 4.5.0.

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

2 thoughts on “A Message from the Trenches ..

  1. MGiepz

    Greate to hear that you are making progress with the new layouting.
    Can you allready forsee what impact this will have on the wizard api? (I guess the PIR engineers allready asked you about that ;))
    We’d like to avoid to ride a dead horse while developing prpt in saiku-adhoc/olap

Comments are closed.