Switft Justice – 30 silver coins well invested ..

With the introduction of the new ODF-based file-format in Version 0.8.10 of the Reporting-Engine, the Simple-XML file-format and the Extended-XML file-format will be deprecated now.

The reporting engine will still be able to understand these file-formats, and all existing reports will run as usual. So there is no need to panic.

But the old file-formats will no longer receive any updates. The features they had in Version 0.8.9 will be the features they have in Version 1.0 and 2.0. 3.0 and so on. Namely, these file-formats will not make use of the new report-elements, nor will they be able to use the new attribute-expressions, control the group’s page-breaking capabilities or utilize the newly introduced details-header and -footer bands.

But why? What’s wrong with these file-formats that served the engine over the last years so well?

There are many reasons, both technical and psychological.

Technical reason #1: With 0.8.10, the internal structures of the reporting engine changed to allow us to implement report- and group-wide style inheritance. Groups are no longer organized in a list with no connection among each other. Now all bands are part of a large tree structure, with sub-groups being child-elements of their respective parent-group.

Groups itself became report-elements. This change opens up the path to finally implement cross-tabbing without raping and pillaging through the code to hack that feature in. This change allows us to implement cross-tabbing in version 0.8.11 in a very natural and integrated way.

Technical reason #2: The existing file-formats are limited to a Single-XML-file. This makes it very hard to add new functionality to the file-format while retaining a clean and maintainable structure on them. At some point both the parsers and writers and the tools built on them become so complex that the costs to maintain them exceed the costs of writing the whole thing from scratch.

Adding new features to the Simple-XML file-format always endangers the only reason the file-format exists in the first place. The Simple-XML format is meant to be easily read- and writeable by humans. Once we add enough new features, the ease-of-use aspect becomes a bad joke.

The Extended-XML file-format started flawed. The file-format is a very low-level serialized representation of a report-object. The format is barely readable by humans, and it is so complex, that you need the blessings of all your gods if you want to add or alter features without breaking existing files. I can honestly say, I am afraid to make serious changes to this system, as I could not tell what would happen then.

The Extended-XML format’s structure was bound to the old internal structures of the reporting-engine. With these structures changed beyond recognition, the parser and writer itself would need severe changes to operate in a non-legacy mode.

Marking both XML-parsers as deprecated makes sure that we can safely hack them to accept the old files no matter how the internal report-structures look like today. The ext-writer will not be able to fully serialize “new” reports – the resulting XML file will not contain the new features and given the new group-structures, might not even be accepted by theĀ  Extended-XML parser.

And then there is the psychological reason: The old “hack-your-own-xml-file” game has to stop. But words are whispers in a storm, you have to grab the developer where he’s vulnerable: by his laziness.

In the reporting projects, we now maintain four or five (depending on how you count) xml-file-formats that all do the same; two of them crude derivates of the Simple-XML format.

Using the stick: With the unified-file-format it becomes harder to proceed with the “do-it-yourself” way. The zip-container is more complex than the plain XML-file. But there is an simple API to access it, while the engine can ensure the integrity of the bundle. Sure, you can still hack your way in without using the provided API, but as the costs of that path are now higher, chances are greater that you switch to our API instead.

Showing the carrot: At the same time it becomes easier to just add your own (separate) XML-file into the report-definition-container or to add your data as attribute to the element. All of the hack-file-formats I mentioned before tried to attach some additional information to a report-element to use this data as input later. With the introduction of attributes and the new report-preprocessor-API, these problems can be solved without even touching an own XML-parser.

However, the existing solutions are “ but it works” solutions, so without the hard cut, no one would be tempted to touch them. But now, time works against the laziness: How long will it take for users to demand the new features in the tools that utilize the hacked fileformats? Not long, I’m sure, and then they either fork the parsers and try to adapt them, or come home as the naughty childs they have been, to adapt the unified-file-format.

This entry was posted in Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.