Monthly Archives: November 2011

Pentaho Reporting’s Metadata DataSources now with more scripting power

Pentaho Reporting is rather flexible. No, astonishingly flexible. Calculations for styles, attributes and even queries make sure that there is always a way to tweak a report exactly the way you need it. But until now, there were a few things that were not exactly easy. Custom Data-sources and their method of calculated queries were one of them.

With today’s code drop, this story has fundamentally changed. The Pentaho Metadata data-source is the first data-source that combines static queries with optional scripting features. There is no need to have a “Custom-Metadata” datasource anymore. No need to cram your query calculation logic into a tiny field on the report’s query-attribute.

Here it is: The new and greatly improved Pentaho Metadata Data-Source:

The scripting extensions hide beneath the two new tabs on the top and on the query-text area. Those who don’t need scripting or who are scared of having to program can safely continue with their life – just ignore the scripting-tabs and you will be safe in your static world. But if you feel you are born to greatness and therefore you cannot be limited to predefined queries, then follow me on a exciting journey into the land of power-scripting.

The scripting extensions are split into a global script and a per-query script. The global script can be used to define shared functions or global variables that can be seen by all query-scripts. The init()function can also be used to change the configuration of the data-source itself. The per-query-scripts allow you to (a) customize the query string, (b) calculate the “additional fields”  information for the query-caching and (c) allows you to post-process the returned table-model.

The original “Custom-*-Datasources” always had the problem that you can only add one of these datasources per report or subreport. Now that we combine scripts with named queries, we can have a independent script for each of the queries and of course as many datasources as necessary.

The scripting backend uses the JSR-223 (javax.script) scripting system. By default we ship with both JavaScript and Groovy support. There are quite a few JSR-223 enabled languages out there, so if you have a favourite language, just drop it into the classpath on both the Pentaho Platform server and the Pentaho Report Designer and you are ready to code.

The picture above shows you the global script pane. For the two languages (JavaScript and Groovy) we ship with, we have a prepared template script that contains some documentation as well as the empty function declarations of the functions we expect to be able to call.

It is absolutely OK to delete any function you dont need – in that case Pentaho Reporting simply ignores that part. This will help you to keep your scripts small and simple. You can load scripts from external sources as well, but be aware that it is your responsibility to make these external scripts available to the report at runtime. So if you load a script from a local directory, you have to ensure that the same works on the BI-Server (or your production environment) too.

For the upcoming Pentaho Reporting 3.9.0 release, the Pentaho Metadata datasource is the only datasource with this sort of scripting. Once that design has proven itself and I have more feedback on how well it works, we will add this scripting system to all major data-sources in Pentaho Reporting 4.0.

PRD-3639: On Adding Version checks into Pentaho Reporting

From time to time I get support requests from users who try to create reports in the latest Pentaho Report Designer and then try to run it in a ancient installation of the Pentaho BI-Server. I already wrote about the limitations Einstein placed on us software developers, so lets all agree – it’s not possible and there will be no fix.

Along with a Voodoo-security change that obscures all passwords stored in a PRPT file, I now prevent the horse from ever drinking petrol again. I added a strict check while a report is parsed so that newer versions of a PRPT file cannot be opened in older servers. Whenever you try such a thing, you will now get a very clear error message telling you:

The report file you are trying to load was created with Pentaho Reporting 12.0 but you are trying to run it with Pentaho Reporting 3.9.0. Please update your reporting installation to match the report designer that was used to create this file.

and if you just missed a patch release, you will be able to continue, but get a warning urging you to upgrade.

The report file you are trying to load was created with Pentaho Reporting 3.9.5 but you are trying to run it with Pentaho Reporting 3.9.0. Your reporting engine version may not have all features or bug-fixes required to display this report properly.

Older releases that did not store version information in a usable format inside the PRPT files will not be checked. This check only applies to released versions. All development versions do not store any version information and do not validate version information they may find in the files. If you are using development versions I assume you know what you are doing.

For the upcoming 3.9.0 release, you won’t see any impact. The first time this check will be hit will be Pentaho Reporting 4.0 later next year and from that moment on I will never have to talk about poisoning my poor old horse with petrol again (it prefers bio-diesel anyway).

A Message from the Trenches ..

Over the last 6 weeks I finally found the time to dive into the crosstab related development. Crosstabbing as a data manipulation exercise is a rather easy and straight forward as an algorithm. Printing simple crosstabs without regard for user defined calculations is not hard either – if you are willing to stick to the simple model for eternity. But integrating the crosstabbing code so that the layouting uses our existing capabilities of style- and attribute-expressions, flexible layouts and decent scalability even when processing massive amounts of data – that takes more than a two-weeks prototype hacking.

Step zero on my quest happened earlier this year when I wrote a testing framework to validate the layouter output on a very low level. The idea of this testing framework is based on “golden samples” – known good data-dumps of the layout results. For every change I make, I now can validate if and how the final layout is affected. Sometimes I want to see changes (especially when fixing bugs), but for most parts I want to add new functions without breaking existing reports. The testing framework helps me to detect changes in the layout that I did not foresee.

The framework shields me from the system components, like the local font renderer (a real bugger that changes with every update of the operating system) and from changes in the local font files (Arial on Mac is not the same as Arial on Windows, for instance).

Step one of implementing crosstabs then involved getting a true table layout into the reporting engine. Our reporting engine does is a banded engine in the tradition of the old COBOL and RPG/400 reporting tools. Each band is fairly separate from other bands. On paper you get the illusion of dealing with tables, as normally row after row of data is printed at the same horizontal position. But our primitive model cannot adjust for changes in the column sizes, as once a band is printed it cannot be altered any more.

This is similar to trying to create a table in a word processor by only using tabs. Once you try to put a overly long item into your ‘cell’, the layout goes all wrong, as previous and following rows do not alter their column sizes to match the large item.

But if we use a proper table, with real columns and rows, the problem goes away almost immediately. A table allows us to place elements relative to other elements in other rows and to maintain that relationship as long as we want. For small tables, this may be for the whole data-set, but similar to the “table-layout: fixed” CSS style attribute, we can also define a cut-off point after a certain number of rows and thus balancing the need for keeping the table flexible and the need of not buffering too much layout data.

During a long interlude during the summer time I was busy working on two rather large bug-fix releases (3.8.1 and 3.8.3) with no time whatsoever to do any new development. (I just managed to sneak in a week or two of table-layout related work.). And now, finally, since Mid-September I am back into the layout system. By eliminating all distractions – including writing articles here – I managed to get some private cuddle-up time with the rendering system.

The reporting engine comes from a banded background, and thus our existing layout system was built around the assumption that the world is easily consumable in banded chunks. With the Citrus-rewrite two years ago I opened the layouter to a more flexible world view by recreating the layout system as a CSS/DOM oriented layout system. But without a need for cross-band layouts, I still ended up introducing many assumptions that only work in a banded world.

In the old (3.x) layout system, global structures like groups, subreports and root-level bands are produced by the “Renderer” class. The renderer is the central point of the layout calculations and manages both the creation of the layout nodes and the calculation of the final layout. The contents of the bands themselves are then computed by a class named “DefaultLayoutProducer”. This model fails horribly if the banded layout is just a subset of a larger structure, like a table, for instance. The “DefaultLayoutProducer” is not aware of the outside model, and the “Renderer” does not care what the “DefaultLayoutProducer” does within his own band. I created the model of a completely dysfunctional family here.

With the new system in place, there is only one point that produces layout nodes. The model is no longer a collection of local sub-models but one big model with a global state. That not only simplifies the code, it also opens up a new set of capabilities.

So far, the layout system rewrite is nearly complete. The “golden sample” tests of the engine-core project are running fine on my box now, but some of the integration tests in the “testcases” project still fail. Once they work, I can rewire the layouter to accept global table definitions across groups. I can also finally open up the group layouts to support more layout options, and thus allow to print groups horizontally instead of vertically, or even print the header, group-body and footer side-by-side.

For the next few weeks, I will be back to bug-fixing for the upcoming 3.9.0 release. This release will contain mostly bug-fixes and will be shipped with the BI-Server 4.5.0.