ODF: More than ‘just’ an Office file format

With our upcoming release of version 0.8.10 of the Pentaho Reporting Classic Engine we finally leave the path of using primitive XML files for holding report-definition information.

From the very beginning, Pentaho Reporting Classic (at that time known as JFreeReport) we offered a way to describe report-layouts using a more or less complex XML-description language. At some point, the simple, human readable fileformat was no longer powerfull enough to keep up with all the shiny new features we added. The verbosity of the XML grew, 100kb for a mildly complex report were not uncommon. I guess with all the pain and horror that fileformat caused, even the devil would be well-satisfied of the result.

So its time to get over that horror and to go back to a somewhat more friendly file-format. (After all, hunting bugs is easier if you can spot errors at once, instead of manually parsing the mess.)

With version 0.8.10, we start to use a ODF-based fileformat for the reporting engine. ODF is a document container format based on ZIP-Archives and enriched with a lot of meta-data. Against the general perception, ODF can be used independently from OpenOffice.

As said, ODF files are ZIP files with well-defined meta-data files inside them. In addition, all document contents, stlyes and images are stored in the ZIP file as well.

Using ODF as our new report-fileformat immediately solves a couple of problems and adds some instant value:

  1. Easier distributionWith all the the report-definition content in one place, distributing reports becomes a lot easier. There is exactly one file that contains all the images, datasources, reports and subreport, stylesheets and whatever might be needed to execute the report. Moving reports to new locations is equally easy now, and there is no chance to forget to copy images to the new location anymore.
  2. More meta-data to manage report definitionsWith the OpenOffice document meta-data specifciation, we instantly have a sane way to define document wide meta-data (like the Author, last-modified date etc). Especially in large installations with thousands of report-definitions, meta-data can be a real life-saver. As we stick to the well-known OpenOffice meta-data attributes, there are plenty of tools out there that assist in the task of managing these files.
  3. ModularityThe new ODF based fileformat consists of many small files that can be found in well-defined locations (or can be easily enumerated using the Manifest of the document bundle). It is easier to write parsers and tools for small, well-defined XML-Schemas than to write a parser for the hellish fileformats we had before.

    Each aspect of the report-definition is contained in separate files. Styles, datasources, parameters, functions – everything is now easily accessible to small tools which do not need to know about the complexity of the whole report-definition system.

  4. CompletenessFor the first time in the history of the report-engine, we now have a fileformat that covers the whole lifecycle of a report. The report file format contains parameter definitions, which contain all information that is needed to generate standalone parameter prompts or to generate XActions (for those reports running in the Pentaho-Platform).

    The ability to embedd data-sources into the report-definition format has left its childhood days now. Along with some supporting new data-source implementations, the new file format carries enough information to allow the direct execution of reports from any report-definition without a single line of manual java-code needed.

  5. Produced by machines for humans and machines alikeBy adding a large scale mandatory meta-data layer that provides mappings for each and every element, attribute, stylekey and expression found in the reporting engine, we were able to combine the advantages of the Extended-Fileformat (completeness and the ability to serialize any report-definition object into XML) with the ease of use of the Simple-XML format.

    The meta-data layer itself also serves as knowledge base for the report-designer. Future versions of the report-designer will be able to query the engine for a list of supported element types, expressions or styles. The days when the report-designer did not expose functionalities of the reporting engine are counted now.

The new fileformat will use the mime-type “application/vnd.pentaho.reporting.classic”. As we stick to the ODF standard, creating rules for the MimeMagic database is easy. As usual, Unix is easy.

But for Windows users a question remains open: How shall we name the new file-format? And what file-extension shall we give to these report files?

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.