Monthly Archives: March 2007

Things that should never have existed in the first place

Pentaho Reporting never had real charting support. Although JFreeChart, the reference project for Java-based charting, was developed next door, we never came to the point where users could have added charts to a report.

The engine itself is able to render Drawable-objects. Drawables are objects that know how to render themselves to a Graphics2D context. This interface was defined by JFreeChart as simple abstraction layer for its components. So the only way to get charts into the report, was to print them as drawables. If you wanted charts based on the data from the report, you would have to create a function that collects the data and finally creates the JFreeChart instance, which then could be printed in the report.

This story is sick, I know. But we haven’t even reached the end yet.

When Pentaho elected JFreeReport to be the primary Reporting Engine for their plattform, they also asked about charting. Naive as I was, I mentioned that sick way of getting charts into the report. Even worse, I’ve created a working sample-report for them so that they can see it working.

Oh, I should have known better. Every sin gets punished sooner or later, and my punishment is called ‘ChartExpression’. The prototype actually evolved into a couple of function implementations. The Pentaho folks still think, that this was a natural evolution born out of the need for at least some charting capabilities. But I know better: The gods of good software architecture were really pissed of at me, just for creating that abnomination. If there were some deserts near by, you would now see me heading of for a 40 year tip to nowhere, just to end my life directly in front of the promised land.

So all you developers out there: If someone asks you for a quick solution for a problem, and all you could deliver is an architectural nighmare: turn them down. Tell’em: ‘No way! There is no possibility how this can be done right now.’

If you feel lucky, you might even ask them for a project budget to develop something sane.

But never ever say the unholy sentence: “Sure, we can build something quick and dirty. We’ll create a better solution as soon as we find more time for it.”

That time will never come …

Converting Paintings into Tables

In Pentaho Reporting Classic, all Report-Elements are positioned somewhere on a canvas. Whenever an band is being printed, the layouting system jumps in an computes the final layout for the band and all child elements. After the layouting is complete, each element has some valid ‘bounds’, which describe where the painter would have to place the element on the canvas.

The table-generator’s work starts after all elements have a valid layout. For each visible element in the band, the layouter takes the bounds and generates a column or row break for each edge position. All bands of the report are added to a single table. Therefore the table’s column will be based on all generated column breaks of all bands.

Pentaho Reporting Classic has two table-export modes. In the ‘simple’ export mode, the table-generator ignores the right and bottom edge of the child elements (but not for the root-level band). If a ‘complex’ layout is requested, all boundary informations are taken into account.

Theory is dry, lets take a look at some numbers:

Lets assume we have a root-level band with a width of 500 point and a height of 200 points. The band has two childs, a label and a text-field. I’ll specify the bounds as absolute positions, (X1,Y1) denotes the upper-left corner, and (X2,Y2) denotes the lower right corner.

The bounds of all elements involved are:

  • Band: X1=0, Y1=0, X2=500, Y2=200
  • Label: X1=100, Y1=0, X2=300, Y2=100
  • Textfield: X1=100, Y1=100, X2=250, Y2=200

Let’s follow the table-generator’s steps. We assume that the complex layouting is used.

  1. The Band gets added: As there are no column breaks yet, A column break will be added for X1=0 and X2=500. A rowbreak is added at Y1=0 and Y2=200.The first break always marks the start of the table, and the last break marks the end (and total-width) of the table. The table now consists of a single cell, that has a width of 500 points and a height of 200 points.
     
  2. The Label gets added: As there is no column break for X1=100, a new column break is inserted. The table’s only cell splits into two columns.
      Label

    A column break for X2 gets inserted at position 300. The table now contains 3 columns.

      Label  

    The Label’s Y1 does not cause a row-break, as the band already caused one at this position. A row break for Y2 gets inserted at position 100. The table now consists of two rows.

      Label  
     
  3. The text field is added to the table. X1 does not cause a column break, as there is already one at this position. X2 causes a new column break at position 250. Note that the label already occupies the cell from X=100 to X=300. This cell will now span two columns. There is already a column break for the text-field’s Y1 position (at Y=100, caused by the labels bottom edge) and for the Y2 position (at Y=200, caused by the band’s bottom edge).
      Label  
      TextField  

If the table-generator uses the simple algorithm, the resulting table gets simplified in a second step. The column breaks at position 250 and 300 have been caused by a right edge of an report element. These breaks now get removed, so that the resulting table looks like this:

  Label
  TextField

Now it should be clear, that the table-generator works best, if all elements are properly aligned. All elements that should go into one row or column have to start at the same X and y positions. If the strict layouting mode is used, they also must end at the same position. Elements that should go into neighbouring cells must share a common edge. And finally: Elements that do not start at position Zero will cause an empty column or row.

In the next post, I’ll cover how Pentaho Reporting Classic computes cell backgrounds and borders.

A zoology of layouting systems

Reporting is easy, but layouting is hell. In the field of reporting, there are three major output systems:

1. Painting

The generated output is placed freely on a canvas. The whole content generation is graphical, the result has to be a good looking result, and only a few constraints are placed on the output. The results of such a painting operation may look good on paper, but are seldom usefull for further editing.

2. Documents

The generated output is written as an ordered flow of data. A text document or an HTML-page is an example for such an output. The generated document does not define directly, where and how the content is rendered. Interpreting the document’s content and styling information is left to an client application (OpenOffice Writer, for instance, or an Web-Browser). In a document flow order does matter. Content is printed in the same order as it appears in the document flow and in most cases, the content is positioned relative to the last element printed. (Paragraph comes after paragraph, line after line, and word is printed after the preceeding word.)

3. Tables

Spreadsheet documents organize their content as a huge table. Tables have been (and in the case of some browsers: still are) used for the layouting of HTML documents. A table is a grid cells organized in rows and columns – and each cell can have exactly one content. Cells cannot overlap each other, but in almost all systems, cells can be merged into larger rectangular areas.

The Pentaho Reporting Classic Engine is one of the many reporting engines, that uses absolutly positioned elements as layouting paradigm. Report elements are placed freely on a canvas (called a Band). There are no limitations on how and where an element can be placed within the canvas. The element’s definition order also defines the painting order – if elements overlap each other, the first element that is painted serves as background for the next elements.

As long as the desired output type also uses such an free-form canvas to describe the output, everything is fine. Printing or the PDF export are examples of such painting output types.

But the freedom of positioning elements freely on the canvas backfires whenever we try to export reporting into documents or tables. Exporting documents is horrible. Converting the painted report into a flow-text does not really work. Although the content may be preserved, all the formatting gets lost, and most document systems are not able to express the required complex layouting rules to render the report as if it was painted. A document processor was not and is not designed for such an abuse.

In the Classic Engine, we therefore do not even attempt to export into flow-text documents. Our way to generate documents is simple: Generate a table that looks as similiar as possible to like the painted content. Tables may be not ideal for layouting, but they make it relatively easy to position content within the table’s grid. In the early days of the web, most HTML pages used tables for layouting, and most browsers can display them reasonably well. And as most document formats also support tables, we can solve two problems with a single implementation. Therefore the table-export gives us the ability to export to Excel-Workbooks, HTML-Pages and RTF-documents.

The table-infested text documents are by no means user friendly or editable, but at least the result looks good.

Pagination and Content Generation Strategies

In the reporting world (and maybe other heavy duty content generators as well) one of the most common (and most difficult) problems is to lay out the generated content into pages. Its not just describing where to put an element, the real problem starts when you have to insert page breaks and page header and footer.

The easy approach would be to simply ignore the pagination while the content is generated. This allows the content generator to be as simple as possible – that implementation does not have to know anything about layouts, pages or header or footer. In a second step, a layouter would jump in and would process the generated content stream and cut it into pieces that fit on a page.

This strategy is usually found in document oriented reporting engines, like BIRT, Windward Reports, the Pentaho Reporting Flow Engine or the truly horrible reporting system of OpenOffice 2.0. From the report generation approach, they are very close to the Mail Merge functionality of today’s word processors. After the content has been generated, a second engine jumps in to perform the layouting and pagination. In the case of OpenOffice, the layouting is outsourced to the word-processor itself, which makes the reporting engine extra-weight-light.

BIRT, on the other hand, simply defines that a page-header or footer cannot reference the normal-flow content in any way. Although this makes it very easy to implement the layouting afterwards, it severely limits the usefulness of these headers and footers.

The classic Office-Documents have a separation between page-sections and the so called normal-flow content. The content that is distributed over the pages is contained in the normal-flow. The page header and footer are defined outside the normal-flow in a page-layout (sometimes called a master-page) and during the layouting, the master-page and the normal-flow are merged together. Page header and footer behave like templates, they are defined once and are applied multiple times. Some systems allow the header and footer to reference properties from the normal-flow (like the current section title).

The more complicated approach performs the pagination while the content is generated. The separation of concerns architectural pattern clearly states, that such behavior is stupid, if not evil, as it leads to overly complicated systems which are tightly coupled. Maintenance of such an architecture is a nightmare.

But there are some advantages, which justify this. Coupling the content generation with the layouting and pagination allows the content generator to use the feedback from the pagination process. A reporting function can now perform page-local computations based on these events.

In the reporting field, there are a few examples, where it is needed to couple the pagination with the content generation. One of the requirements could be is to perform page based calculations (for instance, the count and/or sum of all items printed on this page). But agreed, these special requirements are very seldom encountered.

The mix-up approach is common among the classic reporting engines, like CrystalReports, JasperReports and Pentaho Reporting Classic. All these reporting engines allow their users to put any content into the page header and footer and generally treat these page-sections as dynamic content that be changed by the user during the report processing runs. Although this offers the maximum flexibility possible, it is also the reason for most of the complexity encountered in these engines.

Eat your own dog-food ..

One of the most effective ways to force developers to create sane code and usable programs, is to force them to use the results of their own work by themselves. Almost immediately non-critical but very annoying bugs get fixed and at least the UI gets cleaned up.

And right now I’m swallowing hard on my bad-tasting dog-food.

With release 0.8.8 of the Classic Engine of Pentaho-Reporting we’ve added Master-Detail reporting capabilities to the reporting engine. It looked very sane at the beginning with unmatched flexibility in the parameter and query passing, and a clean design. The whole beast was bug-infested like a dead dog in the middle of the road on a hot summer day. So we went through the usual cycle of fixing the implementation flaws (and we still do).

But very soon we hit the wall, where we never would have expected it: in the Page headers and page footer handling. Page header and footer are special sections that live outside the normal content flow. Those tricky beasts can appear everywhere and they really don’t care about grouping structures or the general report layout. There are several ways to deal with that problem. Accutate’s BIRT solved it by making the page headers static; page headers that cannot contain data from the normal flow are a little bit dull, but they are easy to implement, they are predictable. Pentaho Reporting and JasperReports have chosen to do it the hard way. Here page header and footers can contain data from the current report. Sure, it doesn’t sound that complicated at first; it’s just printing data after all. But suddenly, as the contents (and thus the height) of the page-header and footers becomes dynamic, you have created some feedback loops in your report processing.

With master-detail reports, that thing becomes even worse. Suddenly the detail report has the ability to define its own page-header or footer. While this is ok sometimes, there are equally important cases where one wants to preserve the master-report’s header all the time. And we haven’t even thought about repeating group-header or footer (which behave like additional page-header or footer, but are only printed as long as the group is active).

As a result, we’ve introduced ‘sticky’ flags for all of these header and footer sections. Once a band is marked ‘sticky’, it will be printed on all detail reports. If the detail report adds its own header or footer, these new sections will be printed below the master’s sticky sections.

But the really funny thing is: Before starting the task of ‘Adding the master-detail capabilities’ we had a lot more planning than for every other addition to that development branch of the reporting engine. And still: We did not realize that such things could happen unless we got hit by them. Praise the OpenSource model where users immediately send you a friendly flame for messing up. Praise the fast release cycles, where one month later the flaw get fixed or at least improved (which gives more friendly flaming until everyone is satisfied).

Damn I really love flaming. Like a good steak, just the right amount of flames makes a raw piece of meat (or software) a well-tasting experience.