Monthly Archives: February 2008

Unified Fileformat Fallout: Report Element Attributes

One of the few classes that has not changed much since ages is the “Element” class. The element defines the smallest building block of a report definition. Elements carry style-information and data. If the element is a band, it carries no data but other elements instead.

The data-source system of the Elements that was used to compute and transform raw-data into presentable data was always a weak part of our system. The system was organized as a set of filters that could be composed together to form a transformation chain. While this bought us great flexiblity, we paid for this with a nearly unmanageable object structure that was hard to manage at run- and designtime.

With the addition of the unified fileformat and with a strong carry over of ideas from the hibernating Flow-Engine, the classic-engine finally leaves this old system behind.

Elements now carry attributes and a thing called ElementType. The ElementType uses the current state and the element’s attributes to compute the presentation value. Under the hood, there not much magic going on anymore: The new code simply copies and pastes the old data-source code and transfers the parameters of the old datasource-classes into the element’s attribute set.

What a small change, but what a huge effect.

The ElementType is now a simple, immutable class. No expensive cloning is needed at runtime. All of the sudden, all of the element’s parameters are well-defined and accessible for the design-time tools.

Now that everything in an element is either part of the style-information or part of the attribute-collection, we can fire up the magic bag of dirty tricks. Style-expressions are used to compute style-information at runtime. Now, with attributes in a similiar structure, attribute-expressions do the same for the attributes.

But thats not all: The attribute-collection is not limited to the data-processing at all. We can store anything in there, and so we do.

Every element can now carry raw HTML fragments and bits of JavaScript that are inserted into the produced HTML output. Web-Applications can and will make use of this feature to bind the generated reports closer to the generating web-application. Elements carry Excel-Page-Header and Page-Footer strings which can be embedded into the Excel output. The Swing-Preview-Components look out for special attributes on all elements to create Hyperlinks and Clickable areas. Applications embedding the report engine can transport any data through the report into the print-preview to create their own Drill-Through systems.

We finally reached the age of full interactivity.

ODF: More than ‘just’ an Office file format

With our upcoming release of version 0.8.10 of the Pentaho Reporting Classic Engine we finally leave the path of using primitive XML files for holding report-definition information.

From the very beginning, Pentaho Reporting Classic (at that time known as JFreeReport) we offered a way to describe report-layouts using a more or less complex XML-description language. At some point, the simple, human readable fileformat was no longer powerfull enough to keep up with all the shiny new features we added. The verbosity of the XML grew, 100kb for a mildly complex report were not uncommon. I guess with all the pain and horror that fileformat caused, even the devil would be well-satisfied of the result.

So its time to get over that horror and to go back to a somewhat more friendly file-format. (After all, hunting bugs is easier if you can spot errors at once, instead of manually parsing the mess.)

With version 0.8.10, we start to use a ODF-based fileformat for the reporting engine. ODF is a document container format based on ZIP-Archives and enriched with a lot of meta-data. Against the general perception, ODF can be used independently from OpenOffice.

As said, ODF files are ZIP files with well-defined meta-data files inside them. In addition, all document contents, stlyes and images are stored in the ZIP file as well.

Using ODF as our new report-fileformat immediately solves a couple of problems and adds some instant value:

  1. Easier distributionWith all the the report-definition content in one place, distributing reports becomes a lot easier. There is exactly one file that contains all the images, datasources, reports and subreport, stylesheets and whatever might be needed to execute the report. Moving reports to new locations is equally easy now, and there is no chance to forget to copy images to the new location anymore.
  2. More meta-data to manage report definitionsWith the OpenOffice document meta-data specifciation, we instantly have a sane way to define document wide meta-data (like the Author, last-modified date etc). Especially in large installations with thousands of report-definitions, meta-data can be a real life-saver. As we stick to the well-known OpenOffice meta-data attributes, there are plenty of tools out there that assist in the task of managing these files.
  3. ModularityThe new ODF based fileformat consists of many small files that can be found in well-defined locations (or can be easily enumerated using the Manifest of the document bundle). It is easier to write parsers and tools for small, well-defined XML-Schemas than to write a parser for the hellish fileformats we had before.

    Each aspect of the report-definition is contained in separate files. Styles, datasources, parameters, functions – everything is now easily accessible to small tools which do not need to know about the complexity of the whole report-definition system.

  4. CompletenessFor the first time in the history of the report-engine, we now have a fileformat that covers the whole lifecycle of a report. The report file format contains parameter definitions, which contain all information that is needed to generate standalone parameter prompts or to generate XActions (for those reports running in the Pentaho-Platform).

    The ability to embedd data-sources into the report-definition format has left its childhood days now. Along with some supporting new data-source implementations, the new file format carries enough information to allow the direct execution of reports from any report-definition without a single line of manual java-code needed.

  5. Produced by machines for humans and machines alikeBy adding a large scale mandatory meta-data layer that provides mappings for each and every element, attribute, stylekey and expression found in the reporting engine, we were able to combine the advantages of the Extended-Fileformat (completeness and the ability to serialize any report-definition object into XML) with the ease of use of the Simple-XML format.

    The meta-data layer itself also serves as knowledge base for the report-designer. Future versions of the report-designer will be able to query the engine for a list of supported element types, expressions or styles. The days when the report-designer did not expose functionalities of the reporting engine are counted now.

The new fileformat will use the mime-type “application/vnd.pentaho.reporting.classic”. As we stick to the ODF standard, creating rules for the MimeMagic database is easy. As usual, Unix is easy.

But for Windows users a question remains open: How shall we name the new file-format? And what file-extension shall we give to these report files?