PRD-2087: Widows, Orphans and we are all keeping together, aren’t we?

Thomas_kennington_orphans_1885One long-standing, never resourced, never fixed issue we had was the case of managing orphans and widows in reports. Well, with the cold wind of austerity blowing over Europe, we can’t forget the widows and orphans, can’t we?

What are Widows and Orphans?

In typography, an orphan is the single line left of a paragraph left on the previous page. A widow is a lonely line that did not fit on the previous page and now sits alone on the next page. With texts, these rules are somewhat easy to solve, as paragraphs are a flat list and not nested into each other.

In the field of reporting, we usually care less about lines of text, we care about the greater unit of sections. When you create a report, you don’t want a group-header being all alone at the bottom of the page, without at least one more details band to go with it. Likewise, a group-footer should not be the only thing on the last page for that group. The trouble starts when you consider these rules in a deeply hierarchical structure as we see in reports.

Like so many layouting concepts, orphans and widows are easy to explain, but usually a pain to resolve. Orphan and Widow rules are cumulative. When you have nested groups, then the orphan declarations of the outer group cannot be solved in isolation.

Lets take the simple example of a two-level report , where each group declares that it wants at least two sections as orphan area. Assuming the group-headers are filled, it means that group’s header and at least the next section must be kept together. For the outer group, that is the outer group-header and the inner group-header. For the inner group it is the group-header and the first details section.

The inner group’s header is covered by two orphan rules now. It is both part of the first group’s unbreakable section, as well as part of the second group’s section. When rules partially overlap each other both rules must be merged.

Last but not least, in the light of these rules, we now can redefine the ‘keep-together’ (or in PRD speech: Avoid-page-break-inside) as a infinitely large number of orphans in the break-restricted area.

How to use this feature

The Orphans, Widows and Keep-together properties can be defined on any section or band. By default, all root-level bands (details, group-header,footer etc) have a default value for ‘keep-together’ of ‘true’.

The Orphan and Widow style settings take a positive integer as value. Negative values are ignored.

A orphan or widow constraint controls how pagebreaks within that element are handled. A widow or orphan constraint only affects child nodes of the element that has the constraint defined. So if you want to keep a group-header together with the next few detail sections, you have to define the orphan-constraint on the group-element. Defining it on the group-header will not have the desired effect.

The reporting engine treats all root-level bands as elements that count as content in the keep-together calculations. All other elements are ignored for the purpose of the widow-orphan calculations. If you explicitly want an element to be used for these calculations, you can set the style-key ‘widow-orphan-opt-out’ to false on that element.

If a element that counts for the widow-orphan calculation contains other widow-orphan enabled elements, the parent element will be ignored for the widow-orphan calculations.

Elements with an canvas or row-layout form a leaf node for the widow-orphan calculation. Their child elements cannot take part in any of the parent’s widow- and orphan calculations. However, they can establish their own widow-orphan context. Therefore, all subreports, even inline-subreports, can declare widow-orphan rules.

The defaults built into the reporting engine ensure that each section on the report is treated as an element for the widow-orphan calculations, even across subreports.

Performance

Solving widow and orphan rules is a costly exercise. Our reporting engine allows user-calculations and user-defined formatting to react to page break events. This allows you, for instance, to reset row-banding at the beginning of the page, or to format pages differently for odd and even page numbers. And finally, it allows you to update the page-header and page-footer on a page break so that you can show data from the current page on the headers.

If an section is finished (for instance a group has been fully processed), we can safely evaluate widows and orphans for that group.

For ongoing content generation: When an orphan value greater than zero is declared on a section, the engine suspends the layout calculation until enough content has been generated to fulfill all orphan rules currently on the report. Likewise, for widow-calculations the report processing is suspended until more than the number of widow elements have been generated as content – and only those elements that are not marked as covered by the widow-rule will be considered for layouting.

Suspending the layout processing can have a severe negative impact on the report processing time. When the engine suspends the layout-calculation, it keeps the unfinished layout in memory until it reaches a point where the layout can be safely calculated again. In the worst case, this suspends the layouting until the report generation finishes.

Keeping the unfinished layout in memory does consume more memory than the normal streaming report processing. When the engine finally detects a page break that fulfills all orphan and widow rules that are active on the report, it has to roll-back to the state that generated the last visible element on the current page to inform any page-event listener about the page break in the right context. Every rollback is expensive and the reporting engine has to discard any content that had already been generated after that page break, as functions may have reconfigured the report state in preparation or response of the page break.

Orphan calculations are usually less expensive as Widow or Keep-together rules.

However, if you export large amounts of data, try to avoid widow- or orphan-rules on your report. Your report will finish up to 100% faster that way.

 

Finally: This major fix is available for both Pentaho Reporting 3.9 and Pentaho Reporting 4.0. The fix did not make it into this month’s roll-up release for the Pentaho Suite 4.8.1 release, but will be available for the general public in the next roll-up release in July. In the mean time, the fix is in the source code repositories, ready to be checked out and built locally 😉

 

This entry was posted in Development, Report Designer & Engine on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

3 thoughts on “PRD-2087: Widows, Orphans and we are all keeping together, aren’t we?

  1. Gunter

    Finally it is there, this feature i have been waiting for so long.

    Thank you for fixing it.

  2. Josiah E. Marsh

    So basically you were counting lines. Unfortunately, it is the only option currently open in CR. We are using another reporting tool as well, and counting lines doesn’t even work, at least not in setting values in other sections…very frustrating. SSRS doesn’t have a widow or orphan option.

    1. ThomasThomas Post author

      Well, not exactly counting lines. As you said, that would not necessarily be working. Reports are not pure text documents, like novels. Reports are usually complex layouts forming tables of somewhat related sections. Your details section may be filled with all sorts of layouts in all sorts of fonts (and thus line heights).

      Therefore we count sections. Sections are larger chunks and usually have a better ganularity. After all, when you look at a report, you want all data for one entity together. You don’t want to see the name of your customer on the one page, and his adress on the next. The compexity around the rules for sensible widow and orphan constraints was what took me the most of the time. Its easy to count lines or sections, but doing it in a way that makes sense in the context or business report is the hard part.

Comments are closed.