Crosstab update – Pagebreaks and header visibility

prd-screen-capture-6It has been a while since I wrote something about the eternal project. So here’s a quick update.

I just checked in a few changes to the crosstab backend and the magical create-a-crosstab dialog. In addition to selecting the row- and column-dimensions (as usual), you now get a bunch of extra options for your crosstab.

The most interesting ones are the switch from a static width and height (80x20pt) to relative sizes. With that a crosstab now tries to fill the available space as good as possible, expanding and shrinking elements when needed.

Marius requested an option to show title headers for the measures. You can control whether you want such headers (they are there by default) or not. As bonus, you get control over the title headers of your column dimensions as well, in case you like it minimalistic.

Last but not least: When a crosstab is larger than a single page, then we now create proper pagebreaks and preprint the header-section on the next page.

For this release, that basically concludes the feature hunt. Until we actually wrap up and do a release build, its hardening and bug-fixing time. So give it a try, and if you tickle a bug out of it, I would be pleased if you could feed our JIRA beast with it.

PRD-2087: Widows, Orphans and we are all keeping together, aren’t we?

Thomas_kennington_orphans_1885One long-standing, never resourced, never fixed issue we had was the case of managing orphans and widows in reports. Well, with the cold wind of austerity blowing over Europe, we can’t forget the widows and orphans, can’t we?

What are Widows and Orphans?

In typography, an orphan is the single line left of a paragraph left on the previous page. A widow is a lonely line that did not fit on the previous page and now sits alone on the next page. With texts, these rules are somewhat easy to solve, as paragraphs are a flat list and not nested into each other.

In the field of reporting, we usually care less about lines of text, we care about the greater unit of sections. When you create a report, you don’t want a group-header being all alone at the bottom of the page, without at least one more details band to go with it. Likewise, a group-footer should not be the only thing on the last page for that group. The trouble starts when you consider these rules in a deeply hierarchical structure as we see in reports.

Like so many layouting concepts, orphans and widows are easy to explain, but usually a pain to resolve. Orphan and Widow rules are cumulative. When you have nested groups, then the orphan declarations of the outer group cannot be solved in isolation.

Lets take the simple example of a two-level report , where each group declares that it wants at least two sections as orphan area. Assuming the group-headers are filled, it means that group’s header and at least the next section must be kept together. For the outer group, that is the outer group-header and the inner group-header. For the inner group it is the group-header and the first details section.

The inner group’s header is covered by two orphan rules now. It is both part of the first group’s unbreakable section, as well as part of the second group’s section. When rules partially overlap each other both rules must be merged.

Last but not least, in the light of these rules, we now can redefine the ‘keep-together’ (or in PRD speech: Avoid-page-break-inside) as a infinitely large number of orphans in the break-restricted area.

How to use this feature

The Orphans, Widows and Keep-together properties can be defined on any section or band. By default, all root-level bands (details, group-header,footer etc) have a default value for ‘keep-together’ of ‘true’.

The Orphan and Widow style settings take a positive integer as value. Negative values are ignored.

A orphan or widow constraint controls how pagebreaks within that element are handled. A widow or orphan constraint only affects child nodes of the element that has the constraint defined. So if you want to keep a group-header together with the next few detail sections, you have to define the orphan-constraint on the group-element. Defining it on the group-header will not have the desired effect.

The reporting engine treats all root-level bands as elements that count as content in the keep-together calculations. All other elements are ignored for the purpose of the widow-orphan calculations. If you explicitly want an element to be used for these calculations, you can set the style-key ‘widow-orphan-opt-out’ to false on that element.

If a element that counts for the widow-orphan calculation contains other widow-orphan enabled elements, the parent element will be ignored for the widow-orphan calculations.

Elements with an canvas or row-layout form a leaf node for the widow-orphan calculation. Their child elements cannot take part in any of the parent’s widow- and orphan calculations. However, they can establish their own widow-orphan context. Therefore, all subreports, even inline-subreports, can declare widow-orphan rules.

The defaults built into the reporting engine ensure that each section on the report is treated as an element for the widow-orphan calculations, even across subreports.


Solving widow and orphan rules is a costly exercise. Our reporting engine allows user-calculations and user-defined formatting to react to page break events. This allows you, for instance, to reset row-banding at the beginning of the page, or to format pages differently for odd and even page numbers. And finally, it allows you to update the page-header and page-footer on a page break so that you can show data from the current page on the headers.

If an section is finished (for instance a group has been fully processed), we can safely evaluate widows and orphans for that group.

For ongoing content generation: When an orphan value greater than zero is declared on a section, the engine suspends the layout calculation until enough content has been generated to fulfill all orphan rules currently on the report. Likewise, for widow-calculations the report processing is suspended until more than the number of widow elements have been generated as content – and only those elements that are not marked as covered by the widow-rule will be considered for layouting.

Suspending the layout processing can have a severe negative impact on the report processing time. When the engine suspends the layout-calculation, it keeps the unfinished layout in memory until it reaches a point where the layout can be safely calculated again. In the worst case, this suspends the layouting until the report generation finishes.

Keeping the unfinished layout in memory does consume more memory than the normal streaming report processing. When the engine finally detects a page break that fulfills all orphan and widow rules that are active on the report, it has to roll-back to the state that generated the last visible element on the current page to inform any page-event listener about the page break in the right context. Every rollback is expensive and the reporting engine has to discard any content that had already been generated after that page break, as functions may have reconfigured the report state in preparation or response of the page break.

Orphan calculations are usually less expensive as Widow or Keep-together rules.

However, if you export large amounts of data, try to avoid widow- or orphan-rules on your report. Your report will finish up to 100% faster that way.


Finally: This major fix is available for both Pentaho Reporting 3.9 and Pentaho Reporting 4.0. The fix did not make it into this month’s roll-up release for the Pentaho Suite 4.8.1 release, but will be available for the general public in the next roll-up release in July. In the mean time, the fix is in the source code repositories, ready to be checked out and built locally ūüėČ


Crosstabs got pushed from 5.0 – but my work goes on

211px-Punishment_sisyphOn Friday, Pentaho development and product management made the decision to remove crosstabs out of the next release. Therefore, the Pentaho Suite 5.0 will not claim support for crosstabs. The decision came upon us after looking at the scope of the remaining work across the whole stack.

So let me give some context to this decision and how it impacts the ongoing and future development work of the reporting stack.

What does it mean for the Pentaho Suite 5.0 release?

The BI-Suite will ship with Pentaho Reporting 4.0. I will continue to finalize the reporting engine’s support for crosstabs, and that will ship with the release. The engine will fully support crosstab reports.

However, the user interface for creating crosstabs will be basically limited to what we have in the current development version, plus fixes to make it work stable. Given the fact that we can’t reasonably expect anyone but consultants and programmers to use a feature that has no proper UI, we will refrain from publicly claiming that the reporting system supports crosstabs (yet).

That means, you create crosstabs via a dialog and you can select and format elements in the graphical editor. Everything beyond that won’t have proper UI support, and thus reconfiguring an existing crosstab requires hard work in the structure tree.

The Pentaho Reporting engine has still some interesting new features to offer to make a upgrade worthwhile. Stylesheet support now allows you to create shared style definitions that can be hosted centrally. A new class of datasources driven by Kettle templates adds support for Big-Data/No-SQL datasources. These templates are parametrized, and deployed as plugins and thus allow you to write reusable datasources that are user-friendly to use. And last but not least: We opened up the layout system, giving you new options on how to format reports.

What were the reasons for the push?

The decision was a classical result of reducing the risk of the upcoming release. The next release contains a massive rewrite of the internals of the Pentaho BI-Server, updating the architecture to the standards of the 21st century. Adding the JCR repository and REST services required paying for a large amounts of work, which will pay off in the future with faster release cycles and easier to maintain code.

We have roughly 6 weeks left until the code needs to be finished and handed over to the release process. Many of the committed features have only few bits missing or need debugging to be checked off as finished.

Over the last month or two, the crosstab UI work was somewhat sidelined by bug-fixing work for the service packs and by work on other features. With too many tasks for not enough hands, at some point something has to give in.

Faced with a large workload that would finally leave little time for QA and documentation, it was only sensible to cut our loses short for this release. To create a release that contains as many goodies as possible, it is more sensible to finish what is nearly done than to work on the larger risk items.

It is difficult to push cases where the missing feature would be a regression or leave a visible gap. At the end, the crosstab cases were clearly separate from the other features, and with clear lines for the cut the safest to push.


Pentaho publishes its first montly Service Pack

Pentaho_ContentLast week, Pentaho delivered it’s first service pack full of bug-fixes for the last two releases to our existing customers. I think this now marks the point where Pentaho crossed over from being a wild teenager towards being an responsible adult.

We provide commercial support for our customers as part of the Pentaho support offering, and as part of that we have a long history of fixing critical bugs in releases outside of the normal release cycle.

The main selling point of any commercial support contract is that of an insurance policy – if something goes wrong and your critical systems are down, there is someone who cares and who can fix them for you. It is the kind of service that lets managers sleep at night in the safe knowledge that their factories will run and their reports continue to be delivered when they wake up the next day.

Until recently, customers with show-stopping problems (severity class 1, with no work-around) would have to go through an escalation process to get the bug-fix machine rolling. The escalation, received by our support department, would land on the desk of the engineering group, who would scramble on their feet to fix it as fast as possible. After we have a fix, it goes through some more testing (which can include that we send out early versions of the fix to validate that it really works in your production system) before it gets wrapped up and officially handed back to support as a ‘customer deliverable’ patch release.

One major drawback of that system was: If a bug was not a show-stopper, you would have a rather hard time to get that through as an worthy emergency fix. This easily leads to situations where a low-intensity bug affects a lot of customers, making everyone unhappy, but it never gets addressed for existing releases, as the bug is not severe enough for a single customer.

This system is working well when there is a crisis, and stays around. Sometimes you just can’t wait until the next patch release comes out.

But for us, as engineering group, dropping all tools and jumping onto emergency bug-fixes causes large disruptions in our engineering process. Emergency patches are born in an expensive process.

Therefore, Pentaho now introduces ‘Service Packs’. Similar to how Microsoft, Oracle and all the old companies publish bug-fixes for their software on a regular schedule, Pentaho’s service packs are following that same approach.

Roughly every 4 weeks – to be precisely, usually in the 3rd week of the month – we package up all bug-fixes that we created over the last month, and make it available to all customers as a patch release.

When we allocate some quality bug-fix time in our planing way before there is a panic call, we can work on the fixes without having to jump around wildly. We get more work done by concentrating a week or so on fixing a series of bugs than by context-switching between our product development work and us delivering emergency fixes.

And when we fix bugs regularly, it makes everyone happy.

Customers are happier, as they see we care, that we fix bugs that annoy them, even though they are not blocker problems. Engineering is happier, as we can fix bugs under less pressure, creating a larger number of fixes with less tears. And when it comes to renewal, sales is happier too, as customers who got help during the year are more likely to see the value of a support contract.

How do we decide what issues get fixed?

When the time comes to assemble the list of things we want to address, we have a list of criteria that help us pick and choose. Here are some of the criteria we use, but bear in mind that this list is in no particular order and not complete:

  • How critical is it? (we rather fix critical issues than cosmetic issues)
  • Is it a regression of an existing functionality?
  • What is the impact on customer(s)?
  • Is there a work around available
  • How many customers reported the problem?
  • Is it a data & security issue?
  • How complex is the fix? Does it require large changes? Is it risky?
  • How close to the patch package cut-off date has this bug been reported?

All these metrics get mixed together to help us form an opinion. So a more severe bug that affects only one customer in a highly arcane scenarios may get fix later than a small fix that affects dozens of customers.

Some issues cannot be solved in the short time frame of the allocated bug-fix time. These issues are likely to be scheduled to the next feature release, especially if fixing them involves major code work, along with the risk to create new problems. A bug fix is not really a bug fix if it introduces new bugs, right?

We currently produce service packs for the Pentaho 4.5 and Pentaho 4.8 release. For report designer, this maps to Pentaho Report Designer 3.9 and 3.9.1 respectively.

Let me repeat it to be extra clear: The old escalation process for show-stopper problems (severity class 1, no workaround available) is still there and will not go away. So when you encounter an issue that has a very negative impact on your operations, please continue to use the escalation process to make us aware of that. We then work together to resolve your problem.

Adding Service packs as an additional tool just makes it easier for us to improve our existing products in a more timely fashion, with fixes made to work within your existing product and installation. This way, getting and installing bug fixes can be as easy as installing the latest Windows Update, so that you can spend more time growing your business.

LibCGG – how to render CCC charts without a server

Sunburst_chart_smallThe CGG plugin does a nice job, trouble is: It is vendor locked-in. Lets see whether we can change that.

Years ago the smart guys at Web-Details started to use Protovis to create modern charts for their Dashboards in a project called CCC (Community Chart Components). Inevitably, these charts need to be printed from time to time, so shortly after that they created the CGG-plugin for the Pentaho BI-Server to do that.

I like the Bi-Server. I also like printing. But I don’t like having to have a server running to get my charts as images into a report. So a few weeks ago, I took the CGG plugin and pruned everything that relates to BI-Server specific code. Refactored. Sliced it a bit. As a result, we now have LibCGG, readily available on GitHub.

What is LibCGG

LibCGG is an abstract layer to render CCC/Protovis charts. Its only focus is rendering. It takes the relevant javascript that makes up the charts and produces SVG or PNG output. LibCGG comes with some JUnit test-cases showing that simple samples provided by Web-Details actually run. None of these samples have been modified in any way, they just run.

What is it NOT

LibCGG does not deal with data-sources. It does provide an interface that can be implemented, but it does not come with data-sources itself.
LibCGG does not deal with HTTP requests or even the format in which charts may or may not be stored, defined or delivered to users. It is up to the actual implementation to deal with that. I have modified a version of CGG to use LibCGG as a prove of concept. After all, we dont want to loose functionality, don’t we?

What do we need to use LibCGG in the reporting engine?

At the moment, I have not written any glue code to connect LibCGG with the reporting engine. Ultimately, this will happen though – why else would I care to separate CGG from the server? The barriers are surprisingly low. Pentaho Reporting already handles SVG data, and thus LibCGG needs just a thin wrapper around an existing element for a first show-off.

After that, we will need a chart editor. Pedro assured me that CCC charts come with enough metadata to make it easy to get a basic one up and running quickly. Once we have that, I am sure our UI team will want to come in to make that experience less geeky.

And last but not least: We need to separate CCC from CDA (Community Data Access) a bit. At the moment, there is a silent assumption that CCC charts exclusively communicate with a CDA datasource. It should not be too hard to reroute those calls to directly go to the report’s declared data-sources instead.

And now: The one million dollar question: When .. will it be ready?

With a bit of week-end magic, how about May? April should (hopefully) see us get feature complete on the committed features for Pentaho Reporting 4.0, so there is plenty of time for some Ninja coding. I even have a designated place for it: The ‘extensions-charting’ module, which was reserved for Pentaho’s next-generation charting that never really made it. CCC – be welcome, and never mind the ghosts of past visualizations.

InfoWorld – clueless beyond rescue (and JavaWorld reprints it unread)

I recently stumbled across one of the most clueless Java-bashing articles ever published in a Java magazine. JavaWorld reprinted a piece (I would not call it ‘article’) from Infoworld called “How to kill Java dead, dead, dead“.

I don’t mind killing Java in the browser, along with flash and all the other garbage that clogs up my CPU. I have enough bad things to say about the crapware-installing Java-Installer Oracle (and before that Sun) produce. As if there are not enough toolbars out there for unsuspecting users to catch.

No, what I despise is the level of unprofessional ranting from the intern who wrote that article. Seriously, don’t you have any journalists who know how to research an article anymore?

Java in the browser is dead. The Java security model is as flawed as the Flash sandbox model – nice idea but once you get big enough it becomes too complex to handle it safely.

But lets wade through the article.

First, someone please explain that guy that there is a difference between a Java JRE installed on a system and a Java Plugin added to your browser. A program on your system does not necessarily cause a problem. The problem only starts when someone from outside can run programs on your computer without your consent.

When you use your browser (the internet viewer, dear author), then that browser runs programs for you. These programs, in the form of JavaScript or Java-Applets come from a outside source (the web-server) and run on your local machine. So if your browser or one of the plugins has a flaw that allows the outsider to run dangerous stuff, then he can gain control over your computer.

To fix this, fix your browser. Remove the plugins that you don’t need, block JavaScript by default (and use NoScript to enable it when needed), and you will be quite safe. When you remove the Java Plugin, then it does not matter whether your Java installation itself is unsafe, as your browser does not make use of it without a plugin.

After a few paragraphs of clueless rants, he then comes to describe how the Flashback virus was caused by a bad Java installation. Oh, my. Read the first paragraphs of the analysis of Flashback and you learn: It came as a drive-by infection via web-sites. What do you use to view web-sites? A browser. So what do you need to fix? Your browser!

The feds recommended that users disable Java in the browser, and they should. But that still leaves Java on the desktop where it can be exploited, as Mac users found out a couple of years ago to Apple’s chagrin.

Well, let me repeat. Computer do not run stuff on their own. If you are computer-illiterate it may look like it, but someone somewhere has to initiate that communication and tell your computer to invoke a program. If you don’t visit web-sites with your browser, it is unlikely that someone will do something on your system. (If you have a firewall, as Windows users affected by SQLSlammer found out.)

Apple disabled Java mainly for political reasons. First, maintaining a separate JDK fork was expensive and second, Apple wanted to foster a bit of vendor lock-in with its App-Store and hopefully everyone coding Objective-C forever. A platform independent programming system is not good if you are selling a closed platform.

On to page 2 of the article.

Websites using Java? What is this – 1999? None of the banks I know in Germany, Ireland and the UK use Java. None of the Airlines I use (Lufthansa, British Airways, Aerlingus, KLM, etc) use Java. Heck, even Ryan-air does not use Java, even though their website looks like it is from 1999.

Claiming that Java (as applet) is used for thousands of mission critical websites is probably true. The same goes for IE6 along with old ActiveX controls. Internal web-sites of companies are slow to change. But the same companies have administrators who should be able to secure their systems. As a start, they can ensure that separate browsers are used for intranet and internet. If they can’t do that, how about firing them?

(Fact: They don’t. I get two waves of spam delivered and filtered on my servers. One between 9 and 10 my time, and one between 9 and 10 US east coast time. No waves on weekends and public holidays.)

Then he goes on claiming that Java is critical and hard to disable, because of French ex-territorial voters. What? If a french person decides to leave Gods own country to live amongst English speaking people, isn’t that alone reason to take their voting rights away? Or maybe the french government is as clueless as every other government when it comes to technical decisions.

But claiming that a few thousand French people’s reliance on Java for voting every 4 years or so makes Java indispensable is ridiculous.

And he goes on that

“unscheduled outages” would be devastating if OS X and Windows suddenly blocked Java, as the feds essentially asked us to do this week.

Didn’t he state on page one that Apple, literally overnight, disabled Java in the browser. And did the world stop? How about Microsoft not shipping Java since somewhere around 2002? How did that stop the world?

And now my favorite sign that your author is not educated in computer terms:

But here’s what Apple and Microsoft can and should do: Announce that the next major versions of OS X and Windows will not run Java, period.

An operating system is not able to stop users from running software of their choice unless that system is so locked down that only approved software can run, like IOS or MacOS when you crank up the paranoid mode to only allow apps from the App-store.

I guess Apple, world dominator with ambitions, would love that model. But here’s the catch: You can only do that when you lock out everyone. No more custom software for you, naughty author. And once every computer is locked down, whoever holds the key holds the power. Yes, I know, Steve Jobs always wanted that, but I would not accept it.

And filtering it out? Apart from the legal implications of anti-competition laws, so far no OS vendor was able to filter out virus- and trojan software.

I stop here, as afterwards the poor author goes into rambling mode after exhausting any sensible argument.

I end it with the quote from the last page:

If Microsoft and Apple don’t make Windows and OS X Java-free platforms like [..] Android…

and go back to my JDK to write a Android application while it is not yet outlawed.

Lessons learned:

Now that all journalists are fired and after they moved into PR, we are left with interns to fill our brains with garbage.
And we need more regulation to get Java off the streets, so that kids can start playing with guns instead of applets.
And the world would be a better place if computer were not allowed to run dangerous stuff, so guvn’r please rescue us.

Moving to Git and easier builds

During the last year, as part of my work with the Rabbit-Stew-Dio, I fell in love with Git. Well, sort of, that marriage is not without conflict, and from time to time I hate it too. But when the time came to move all our Pentaho Reporting projects to Git, we all were happy to jump on that boat.

As a result, you can now access all code for the 4.0/TRUNK version of Pentaho Reporting via our GitHub Project. This project contains all libraries, all runtime engine modules and everything that forms the report-designer and design-time tools.

Grab it via:

git clone

Code organization

Our code is split into three groups of modules.

  • “/libraries” contains all shared libraries and code that provides infrastructure that is not necessarily reporting related.
  • “/engine” contains the runtime code for Pentaho Reporting. If you want to embed our reporting engine into your own Swing application or whether you want to deploy it as part of a J2EE application, this contains all your ever need.
  • “/designer” contains our design-time tools, like the report-designer and the report-design-wizard. It also contains all data source UIs that are used in both the Report Designer and Pentaho Report Wizard.

If you use IntelliJ Idea for your Java work, then you will be delighted to find that the sources act as a fully configured IntelliJ project. Just open the ‘pentaho-reporting’ directory as project in IntelliJ and off you go.¬† If you use Eclipse, well, why not give IntelliJ a try?

Branching system

At Pentaho we use Scrum as our development process. We end up working on a set of features for about 3 weeks, called a Sprint. All work for that Sprint goes into a feature branch (sprint_XXX-4.0.0GA) and gets merged with the master at the end of the sprint.

If you want to keep an eye on our work while we are sprinting, check out the sprint branches. If you prefer is more stable, and are happy with updates every three weeks, stick to the master-branch.

During a Sprint, our CI system will build and publish artifacts from the sprint branches. If you don’t want that, then it is now easy to get your own build up and running in under 5 minutes (typing time, not waiting time).

Building the project

The project root contains a global multibuild.xml file that can build all modules in one go. If you want it more finely granulated, each top level group (‘libraries’, ‘engine’, ‘designer’) contains its own ‘build.xml’ file to provide the same service for these modules.

To successfully build Pentaho Reporting, you do need Apache Ant 1.8.2 or newer. Go download it from the Apache Ant Website if you haven’t done it yet.

After you cloned our Git repository, you have all the source files on your computer. But before you can use the project, you will have to download the third party libraries used in the code.

On a command line in the project directory, call

ant -f multibuild.xml resolve

to download all libraries.

If you’re going to use IntelliJ for your work, you are all set now and can start our IntelliJ project.

To build all projects locally, invoke

ant -f multibuild.xml continuous-local-testless

to run.

If you feel paranoid and want to run the tests while building, then use the ‘continuous-local’ target. This can take quite some time, as it also runs all tests. Expect to wait an hour while all tests run.

ant -f multibuild.xml continuous-local

After the process is finished, you will find “Report Designer” zip and tar.gz packages in the folder “/designer/report-designer/assembly/dist”.

If you get OutOfMemoryErrors pointing to a JUnitTask, or if you get OutOfMemory “PermGen Space” errors, increase the memory of your Ant process to 1024m by setting the ANT_OPTS environment variable:

export ANT_OPTS="-Xmx1024m -XX:MaxPermSize=256m"

Building the project on a CI server

Last but not least: Do you want to run Pentaho Reporting in your own continuous integration server and you want to publish all created artifacts to your own maven-server? Then make sure you set up Maven to allow you to publish files to a repository.

  1. Install Artifactory or any other maven repository server.
  2. Copy one of the ‘ivy-settings.xml’ configurations from any of the modules and edit it to point to your own Maven server. Put this file into a location outside of the project, for instance into “$HOME/prd-ivy-settings.xml”
  3. Download and install maven as usual, then configure it to talk to the Artifactory server.

Edit your $HOME/.m2/settings.xml file and locate the ‘servers’ tag. Then configure it with the username and password of a user that can publish to your¬†Artifactory server.
Replace ‘your-server-id’ with a name describing your server. You will need that later.
Replace ‘publish-username’ and ‘publish-password’ with the username and password of an account of your artifactory installation that has permission to deploy artifacts.

<settings xmlns=""           

Now set up your CI job. You can either override the ivy properties on each CI job, or your can create a global default by creating a ‘$HOME/’ file. The settings of this file will be included in all Ant-builds for Pentaho Reporting projects.


After that, test your setup by invoking

ant -f multibuild.xml continuous

It should run without errors now. If you see errors on publish, check your Maven configuration or your Artifactory installation.


With the new build structure and the move to Git, it has become tremendously easy to download and work with the Pentaho Reporting source code. Even formerly daunting tasks like setting up an CI server have become simple enough to be documented in a single blog post.


PRD-4090: Data-Factories calling other Data-Factories

DataFactories now have a changed “initialize(..)” method, which
takes a single “DataFactoryContext” object.

The context contains the known init-properties (configuration,
resource manager, resource bundle factory) – nothing new here.
In addition to that, there is now also a property called
‘contextDataFactory’, which is the data-factory of your current

Runtime use:

Calling initialize is now mandatory for all datasources. If you
get strange exceptions from the class
“AbstractDataFactory”, then you forgot to call initialize(..).

The data-factory you get from the context is already initialized,
so you can happily start using it. Please do not make any assumptions
on the contents of the data-factory. Calling queries is fine, but
trying to dismantle it to get to the underlying implementation is


I steamlined the designtime code a bit. When you need to preview
a data-factory, you can now get a properly initialized data-factory via:


The DataSourcePlugin interface now has a new parameter on its
performEdit method. The “DataFactoryChangeRecorder” accepts
“DataFactoryChange” objects from your editor.

As a data-factory editor, you should not modify the report-definition directly. Return your changed data-factory via the method return
value. If you edit existing data-factories, record the original,
unmodified data-factory and the changed data-factory in the

The report-designer or report-design-wizard will take that change
information and incorporate it into the report. Letting the caller
have control over that allows us to record Undo information in PRD.

(Note: Undo recording is not implemented yet, as I ran out of time
here. For now, we just add/replace/delete data-factories without
creating undo-objects. We will have that fixed before we go GA,
simply work as if it is there for now.)


Introducing the Pentaho Reporting compatibility mode

Every time I worked on the heart of Pentaho Reporting, the layout system, in the past, I wondered how the heck am I going to ensure that I do not break our customers existing reports – again.

Before we started work on the crosstab mode, we put a tiny layer of safety onto the engine by creating a set of ‘golden sample’ reports. A golden sample is the pre-rendered output of a report. Each time we make a change, our automated tests generate the output again and compare it with the known good output we have stored.

Over the last four long weeks (where I expected only two to spend) I rewrote large parts of the layout system. Crosstabs are a more dynamic structure than banded list-reports. While banded reports only grow downwards to fill an endless stream of pages, crosstabs grow both horizontally and vertically. The crosstab expands to the right for each new value of the column dimensions it finds, and it expands vertically, when the engine prints more row-dimension values.

The newly introduced table-layout system that powers the crosstabbing requires stricter rules for the layout elements to arrange them in a sensible fashion. Ordinarily, we want the resulting layout to be minimal (use as little space as possible, within the constraints set by the designer), stable (produce the same layout every time) and performant (don’t make me wait).

The old layout rules, however, were historically grown. They evolved around bugs, misunderstandings and the desperate need to not break reports already created ages ago. Breaking reports is fun – if fun includes loosing customers or getting angry calls. I value my sleep, so no more breaking reports for me, if I can avoid it.

From now on, Pentaho Reporting contains a brand new compatibility layer. This layer emulates all the old and buggy behavior to get a report output that is as close to the original release as possible. Our main concern with the compatibility is not necessarily to emulate show-stopper bugs, but to avoid those subtle changes where your report elements start slightly shifting around. When that happens, you can end up with either more pages than before, overlapping elements (and thus lost data in Excel and HTML exports) or anything in between.

How does it work?

Since Pentaho Reporting 3.9.0-GA, each report file contains a version marker in the “meta.xml” file contained in each PRPT-file. When we parse a report, we also read that version number and store it as the default compatibility setting. The report-designer preserves this setting over subsequent load/save cycles, so editing an old report in PRD-4.0 does not automatically erase or replace that marker.

We consider reports without a marker to be old reports that must have been created with PRD-3.8.3 or an even earlier version. Of course, the reporting engine treats any of the ancient xml-formats and the PRD-3.0 “.report” files as ancient and gives them the version number “3.8.3”.

When a report is executed, the report processor checks the compatibility marker. If it is an pre-4.0 marker, we enable our first compatibility mode. The mode changes how elements are produced and how styles are interpreted.

The most important part change is, that a defined ‘min-width’ or ‘min-height’ automatically serves as a definition for a ‘max-width’ and ‘max-height’. There are additional rules, for instance, we ignore layout settings on structural sections, like groups or the group-bodies.

The most important rule, however, is: If you have a legacy report, it cannot contain tables, and thus cannot contain any crosstabs. Tables require a proper interpretation of the layout rules. The old rules tend to contradict each other from time to time, which causes great distress to the table-calculations.A distressed table-layout calculation may commit suicide or may throw away your data, so we better do not allow that to happen.

So before you can start to use newer features in the reporting system, you have to

Migrate your reports

Report migration is the process of rewriting a report’s layout definition to match the new layout rules. During that rewrite, we try to keep the layout as closely as possible to the original. While we are at it, we remove some invalid properties (like layout styles on groups) and migrate the sizing to the updated width/height system (not using the min-max height hack).

You can initiate the migration by entering the migration dialog via “Extra->Migration”. The dialog will list what will happen to your report, and will prompt you to save your report before the migration starts. The migration cannot be undone with the “Edit->Undo” function, so this saved report is your security blanket for the migration.

If you are sure that your report will be fine without the rewrite, you can manually force a report to a different compatibility level via the “compatibility-level” attribute on the master-report object. Be aware that this voids your warranty – your report may run just fine, or it may blow up completely. All bets are open.

Once the migration is done, your report should work as before, but within the corset of the new, and stricter, layout rules.

And to be sure: Let me repeat it – you only need to migrate reports when you want to use new features on them. Your old and already published reports will continue to work just fine without any manual intervention.

Bonus Content: Min/Max and Preferred width and height

Until recently, the layout system was not able to handle the layout constraints for minimum, maximum and preferred sizes correctly. The safe and default option was to rely on the minimum sizes only. The system magically treated all minimum sizes as maximum sizes for most cases, unless the element had a dynamic-height flag set or had an explicit maximum width or height.

With PRD-4.0, the layout system uses better rules with less contradictions. Therefore it is safe now to rely on the preferred size for most cases.

In reports in PRD-4.0 compatibility, a minimum size defines an absolute minimum, and the element will not shink below that size. The maximum size defines the absolute maximum, if defined, then the element will never grow larger than that. The preferred size defines a flexible recommended size. In most cases this is the size your box will use.

But if your element has content that requires more space, it will get it (up to the limit imposed by the maximum size). Each element computes a ‘minimum chunk size’ – think of it as the largest word in text – and uses the maximum of the chunk-size and the defined preferred-width as effective size.

Try our new compatibility mode. See whether it preserves your reports, and if not, please, please, file me a bug-report!

Pentaho Reporting 3.9.1 released on SourceForge

Doug just uploaded the latest stable release of the Pentaho Report Designer on Sourceforge. Thanks Doug!

This is a bug-fix release addressing issues in the drill-linking editor and the formula-editor. It also enabled support for smarter Mondrian-Caching via the new ‘JDBCConnectionUUID’ property.

Be aware that using PRD-3.9.1 requires that you also use BI-Server 4.8. If you have an older BI-Server installation, you have to stick to the report-designer that came with it or upgrade your server to 4.8.

Thank you Sulaiman to handle the whole release process! Well done.