A zoology of layouting systems

Reporting is easy, but layouting is hell. In the field of reporting, there are three major output systems:

1. Painting

The generated output is placed freely on a canvas. The whole content generation is graphical, the result has to be a good looking result, and only a few constraints are placed on the output. The results of such a painting operation may look good on paper, but are seldom usefull for further editing.

2. Documents

The generated output is written as an ordered flow of data. A text document or an HTML-page is an example for such an output. The generated document does not define directly, where and how the content is rendered. Interpreting the document’s content and styling information is left to an client application (OpenOffice Writer, for instance, or an Web-Browser). In a document flow order does matter. Content is printed in the same order as it appears in the document flow and in most cases, the content is positioned relative to the last element printed. (Paragraph comes after paragraph, line after line, and word is printed after the preceeding word.)

3. Tables

Spreadsheet documents organize their content as a huge table. Tables have been (and in the case of some browsers: still are) used for the layouting of HTML documents. A table is a grid cells organized in rows and columns – and each cell can have exactly one content. Cells cannot overlap each other, but in almost all systems, cells can be merged into larger rectangular areas.

The Pentaho Reporting Classic Engine is one of the many reporting engines, that uses absolutly positioned elements as layouting paradigm. Report elements are placed freely on a canvas (called a Band). There are no limitations on how and where an element can be placed within the canvas. The element’s definition order also defines the painting order – if elements overlap each other, the first element that is painted serves as background for the next elements.

As long as the desired output type also uses such an free-form canvas to describe the output, everything is fine. Printing or the PDF export are examples of such painting output types.

But the freedom of positioning elements freely on the canvas backfires whenever we try to export reporting into documents or tables. Exporting documents is horrible. Converting the painted report into a flow-text does not really work. Although the content may be preserved, all the formatting gets lost, and most document systems are not able to express the required complex layouting rules to render the report as if it was painted. A document processor was not and is not designed for such an abuse.

In the Classic Engine, we therefore do not even attempt to export into flow-text documents. Our way to generate documents is simple: Generate a table that looks as similiar as possible to like the painted content. Tables may be not ideal for layouting, but they make it relatively easy to position content within the table’s grid. In the early days of the web, most HTML pages used tables for layouting, and most browsers can display them reasonably well. And as most document formats also support tables, we can solve two problems with a single implementation. Therefore the table-export gives us the ability to export to Excel-Workbooks, HTML-Pages and RTF-documents.

The table-infested text documents are by no means user friendly or editable, but at least the result looks good.

This entry was posted in Architecture, Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.