Pagination and Content Generation Strategies

In the reporting world (and maybe other heavy duty content generators as well) one of the most common (and most difficult) problems is to lay out the generated content into pages. Its not just describing where to put an element, the real problem starts when you have to insert page breaks and page header and footer.

The easy approach would be to simply ignore the pagination while the content is generated. This allows the content generator to be as simple as possible – that implementation does not have to know anything about layouts, pages or header or footer. In a second step, a layouter would jump in and would process the generated content stream and cut it into pieces that fit on a page.

This strategy is usually found in document oriented reporting engines, like BIRT, Windward Reports, the Pentaho Reporting Flow Engine or the truly horrible reporting system of OpenOffice 2.0. From the report generation approach, they are very close to the Mail Merge functionality of today’s word processors. After the content has been generated, a second engine jumps in to perform the layouting and pagination. In the case of OpenOffice, the layouting is outsourced to the word-processor itself, which makes the reporting engine extra-weight-light.

BIRT, on the other hand, simply defines that a page-header or footer cannot reference the normal-flow content in any way. Although this makes it very easy to implement the layouting afterwards, it severely limits the usefulness of these headers and footers.

The classic Office-Documents have a separation between page-sections and the so called normal-flow content. The content that is distributed over the pages is contained in the normal-flow. The page header and footer are defined outside the normal-flow in a page-layout (sometimes called a master-page) and during the layouting, the master-page and the normal-flow are merged together. Page header and footer behave like templates, they are defined once and are applied multiple times. Some systems allow the header and footer to reference properties from the normal-flow (like the current section title).

The more complicated approach performs the pagination while the content is generated. The separation of concerns architectural pattern clearly states, that such behavior is stupid, if not evil, as it leads to overly complicated systems which are tightly coupled. Maintenance of such an architecture is a nightmare.

But there are some advantages, which justify this. Coupling the content generation with the layouting and pagination allows the content generator to use the feedback from the pagination process. A reporting function can now perform page-local computations based on these events.

In the reporting field, there are a few examples, where it is needed to couple the pagination with the content generation. One of the requirements could be is to perform page based calculations (for instance, the count and/or sum of all items printed on this page). But agreed, these special requirements are very seldom encountered.

The mix-up approach is common among the classic reporting engines, like CrystalReports, JasperReports and Pentaho Reporting Classic. All these reporting engines allow their users to put any content into the page header and footer and generally treat these page-sections as dynamic content that be changed by the user during the report processing runs. Although this offers the maximum flexibility possible, it is also the reason for most of the complexity encountered in these engines.

This entry was posted in Basic Topic, Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.