Gunther asked me: “Thomas, how can we create a table of contents in our reports?”
This is not a new question at all, after all. Generating a table-of-contents has been requested since the old jfree.org days, years before Pentaho came to life. But a gross lack of capabilities in the layouter and the engine itself prevented any serious attempt at that time. But as years passed by, the engine matured, subreports became a reality, the layouter was replaced (several times) and with the Citrus release element-attributes, report-bundles and report-preprocessors became a reality. So it’s time to revisit this feature request one more time.
So how do I get the new feature?
The table-of-contents feature is implemented as two extensions modules – one for the reporting engine and one for the report-designer. In addition to that, I had to extend the engine-core in the parser and report data processor to adapt it to the new requirements.
The feature requires at least Pentaho Reporting 3.6.1. It therefore needs either a BI-Server 3.6.0 or a BI-Server 3.5.2 with an upgraded reporting-engine. Simply follow the instructions on how to replace the reporting engine in BI-Server 3.5.2, and you shall be ready to go.
Now download the updated classic-engine-core, extensions-toc and report-designer-extensions-toc. Place the classic-engine-core and the extensions-toc into “pentaho/WEB-INF/lib” on your server and into “report-designer/lib” in your report-designer. Put the “report-designer-extensions-toc” into the “report-designer/lib” directory as well. To update PRD, you could also just grab a CI build of the PRD assembly to get the full product.
How do I use the new Table-of-Contents feature?
When you start the Pentaho Report Designer the next time, you will see a second sub-report icon in the list of available elements. The icon will be labelled “table-of-contents”. Just drag it into your report. It behaves like any other subreport, so both banded and inline-mode are available.
However, I do recommend that you use it in banded-mode, as table-of-contents usually span the full page-width anyway and banded subreports are more performant than inline subreports.
You can start designing your table-of-contents like any other subreport by double-clicking on it. The data-source is already preconfigured to show the fields that will be available.
- “item-title” – a object/text computed by the title-field or title-formula property of the ToC-element.
- “item-page” – the page the current item was found. If the item spans several pages, the first page is returned.
- “item-index” – the item-index as text. The index counts the number of group-starts for each defined group. The string looks somewhat like this “2.5.9”
- “item-index-array” – the item-index as array of Integer values. Perfectly suitable to be printed via the CSVText function or any other expression.
In addition to these static columns, a set of extra columns in the name pattern “column-value-X” are added, where X is a zero-based index of the defined group-values. The columns contain the group’s field-value that was read at the time the data was generated.
How do I control what contents are included in the ToC-generation?
The actual generation is carried out by a custom function, which receives its configuration from a bunch of attributes on the “table-of-contents” element. The fields and formulas defined in there are evaluated in the context of the report that contains the table-of-contents element.
- “group-fields” – Defines both the depth of the data-collection and the fields from where to read the “group-value-X” values. If the group-field given in the array is empty, the field value will be read from the current relational group and in the details-processing, the value will be null. If the “group-fields” list is empty, an automatic mode is activated that collects all groups extracting the group-value from the relational group.
- “collect-details” – Defines, whether detail items should be included in the data-collection. Be aware that this can easily blow up your memory consumption, as we have to hold the collected items in memory. I would not use this on a “million-rows” report.
- “title-formula” – Defines a formula that is evaluated when a new item has been collected. The formula will only be evaluated if the title-field is not set. “title-field” – Defines a field in the master-report that will be read for a valid item-title.
- “index-separator” – Defines the separator text that is used between the index-elements. It defaults to “.”.
OK, now I have a table-of-contents and it has a pretty set of index-numbers. But how can I have the same numbers on the items of the real report?
Along with the table-of-contents element, the Pentaho Report Designer now has two new functions to generate index-numbers on the fly whenever needed. The “index-number-generator” produces a Integer-array that can be used in conjunction with formulas and other expressions. The result is the same as the one found in the “index-text-array”, with the slight difference that on group-start and group-finished events, the index reflects the level of the current group instead of always showing the full index. So in a one-group report, if used on a field in the group-header, it produces “1”, while on the item-band it would produce “1.1”.
The “index-text-generator” produces the same data as the “index-number-generator”, but defaults to a pretty text instead.
How does this magic work?
When the toc-module is active, a new report-preprocessor gets added to the reporting-engine. This processor checks all reports for occurrences of the “table-of-contents” element. If it finds one, it adds a “TocDataGeneratorFunction” to the report and configures the datasources of the “table-of-contents” element by adding a “external-datasource” and sets the query of the report to the name of the data-generator. And finally it sets a import parameter to pass the function result (a table-model) into the table-of-contents subreport.
During the data-processing stage, the data-generator will build up the tablemodel with all the rows it encounters. As the function is marked as deep-traversing, it will pass through all subreports to collect its data (make sure the export-parameter are defined properly, so that it can see the subreport data down there). At that stage it will put a 9999 into the page-number column, so that there is at least some data when doing the pagination run later on.
During the pagination, the data-generator then replaces the page-number with the real value, so that the content-generator-stage prints the correct values.
Warning: Changing values between the pagination and content-generation stages is always dangerous. Always make sure that your table-of-contents report does not change its layout so that it generates more (or less) pages in the content-generation than it did in the pagination stage. Such behaviour will be rewarded with random crashes and other nasty things.
Great feature! I’m having some trouble figuring out how exactly to use it though. Do you have any sample reports with a table of contents?
The “Operational Reports/inventory list” sample report in later PRD versions comes with a table-of-contents definition.
Thanks Thomas! I was able to figure it out – the sample was very helpful!