New module: Table-of-Contents for Pentaho Reporting

Gunther asked me: “Thomas, how can we create a table of contents in our reports?”

This is not a new question at all, after all. Generating a table-of-contents has been requested since the old days, years before Pentaho came to life. But a gross lack of capabilities in the layouter and the engine itself prevented any serious attempt at that time. But as years passed by, the engine matured, subreports became a reality, the layouter was replaced (several times) and with the Citrus release element-attributes, report-bundles and report-preprocessors became a reality. So it’s time to revisit this feature request one more time.

So how do I get the new feature?

The table-of-contents feature is implemented as two extensions modules – one for the reporting engine and one for the report-designer. In addition to that, I had to extend the engine-core in the parser and report data processor to adapt it to the new requirements.

The feature requires at least Pentaho Reporting 3.6.1. It therefore needs either a BI-Server 3.6.0 or a BI-Server 3.5.2 with an upgraded reporting-engine. Simply follow the instructions on how to replace the reporting engine in BI-Server 3.5.2, and you shall be ready to go.

Now download the updated classic-engine-core, extensions-toc and report-designer-extensions-toc. Place the classic-engine-core and the extensions-toc into “pentaho/WEB-INF/lib” on your server and into “report-designer/lib” in your report-designer. Put the “report-designer-extensions-toc” into the “report-designer/lib” directory as well. To update PRD, you could also just grab a CI build of the PRD assembly to get the full product.

How do I use the new Table-of-Contents feature?

When you start the Pentaho Report Designer the next time, you will see a second sub-report icon in the list of available elements. The icon will be labelled “table-of-contents”. Just drag it into your report. It behaves like any other subreport, so both banded and inline-mode are available.

However, I do recommend that you use it in banded-mode, as table-of-contents usually span the full page-width anyway and banded subreports are more performant than inline subreports.

You can start designing your table-of-contents like any other subreport by double-clicking on it. The data-source is already preconfigured to show the fields that will be available.

  • “item-title” – a object/text computed by the title-field or title-formula property of the ToC-element.
  • “item-page” – the page the current item was found. If the item spans several pages, the first page is returned.
  • “item-index” – the item-index as text. The index counts the number of group-starts for each defined group. The string looks somewhat like this “2.5.9”
  • “item-index-array” – the item-index as array of Integer values. Perfectly suitable to be printed via the CSVText function or any other expression.

In addition to these static columns, a set of extra columns in the name pattern “column-value-X” are added, where X is a zero-based index of the defined group-values. The columns contain the group’s field-value that was read at the time the data was generated.

How do I control what contents are included in the ToC-generation?

The actual generation is carried out by a custom function, which receives its configuration from a bunch of attributes on the “table-of-contents” element. The fields and formulas defined in there are evaluated in the context of the report that contains the table-of-contents element.

  • “group-fields” – Defines both the depth of the data-collection and the fields from where to read the “group-value-X” values. If the group-field given in the array is empty, the field value will be read from the current relational group and in the details-processing, the value will be null. If the “group-fields” list is empty, an automatic mode is activated that collects all groups extracting the group-value from the relational group.
  • “collect-details” – Defines, whether detail items should be included in the data-collection. Be aware that this can easily blow up your memory consumption, as we have to hold the collected items in memory. I would not use this on a “million-rows” report.
  • “title-formula” – Defines a formula that is evaluated when a new item has been collected. The formula will only be evaluated if the title-field is not set. “title-field” – Defines a field in the master-report that will be read for a valid item-title.
  • “index-separator” – Defines the separator text that is used between the index-elements. It defaults to “.”.

OK, now I have a table-of-contents and it has a pretty set of index-numbers. But how can I have the same numbers on the items of the real report?

Along with the table-of-contents element, the Pentaho Report Designer now has two new functions to generate index-numbers on the fly whenever needed. The “index-number-generator” produces a Integer-array that can be used in conjunction with formulas and other expressions. The result is the same as the one found in the “index-text-array”, with the slight difference that on group-start and group-finished events, the index reflects the level of the current group instead of always showing the full index. So in a one-group report, if used on a field in the group-header, it produces “1”, while on the item-band it would produce “1.1”.

The “index-text-generator” produces the same data as the “index-number-generator”, but defaults to a pretty text instead.

How does this magic work?

When the toc-module is active, a new report-preprocessor gets added to the reporting-engine. This processor checks all reports for occurrences of the “table-of-contents” element. If it finds one, it adds a “TocDataGeneratorFunction” to the report and configures the datasources of the “table-of-contents” element by adding a “external-datasource” and sets the query of the report to the name of the data-generator. And finally it sets a import parameter to pass the function result (a table-model) into the table-of-contents subreport.

During the data-processing stage, the data-generator will build up the tablemodel with all the rows it encounters. As the function is marked as deep-traversing, it will pass through all subreports to collect its data (make sure the export-parameter are defined properly, so that it can see the subreport data down there). At that stage it will put a 9999 into the page-number column, so that there is at least some data when doing the pagination run later on.

During the pagination, the data-generator then replaces the page-number with the real value, so that the content-generator-stage prints the correct values.

Warning: Changing values between the pagination and content-generation stages is always dangerous. Always make sure that your table-of-contents report does not change its layout so that it generates more (or less) pages in the content-generation than it did in the pagination stage. Such behaviour will be rewarded with random crashes and other nasty things.

This entry was posted in Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

3 thoughts on “New module: Table-of-Contents for Pentaho Reporting

  1. Sean

    Great feature! I’m having some trouble figuring out how exactly to use it though. Do you have any sample reports with a table of contents?


Leave a Reply

Your email address will not be published. Required fields are marked *