Doing the performance dance (again)

I just changed another bit of the table-export while integrating a patch for PRD-3631. Although the patch itself did take a few illegal shortcuts, it showed me a easier way of calculating the cell-backgrounds for the HTML, Excel and RTF exports of Pentaho Reporting.

After a bit more digging, I also fixed some redundant calls in the HTML and Excel exports for merged cells and row-styles. Both resulted in repeated calls to the cell-background calculator and were responsible for slowing down the reporting engine more than necessary.

The performance of my test reports improved a bit with those changes. But if any, then this case has shown me that clean report design is the major driver of a fast export.

The performance for the reports went up by 15 to 30 percent, with the larger changes on the reports with larger row-counts. However, the reports I test are surely non-representive, as there all elements are perfectly aligned and the report is designed to avoid merged cells.

The patch specifically claims to address performance problems in the cell-style calculation. Agreed, there were problems, and the patch addressed them. But there was no way I could see a 100% improvement on normal reports. Well, not reports that are well-designed and use the powerful little helpers that the Pentaho Report Designer offers to make reports well-aligned.

When I receive production reports for debugging, the picture is usually more bleak. Fields are place rather randomly, and usually misaligned by a few points. They start and end on rather random positions, and usually elements are not aligned to element boundaries across different sections.

Let’s take this visually fairly report as an example:

The many fine grey lines you can see mark element boundaries. The more lines you see, the more cells your report will have. Each cell not only means larger HTML files, it also means more processing time spent on computing cell-styles and cell-contents. Thick grey lines spanning across the whole section usually indicate elements that are off by less than one pixel.

These lines are produced by the Report Designer’s “View->Element Alignment Hints” feature. When this menu item is selected, you will get a better idea on how your report will look when exported into a table-export. If you cannot see the details clearly, zoom in. The Report designer happily magnifies the working area for you.

When exported to HTML, this report here created a whopping 35 columns
with another 35 rows. That is potentially 1225 cells. The resulting
HTML file has a size of 21,438 bytes. For a report with just a few items of text, this is a lot.

In general you’ll want to avoid having to many of these boundaries. In the basic design courses, teachers tell fairly early on that layout where element edges are aligned look cleaner and more pleasing for the eye. When you look at adverts or magazines, you can see this on how articles and images seem to sit along visual boundaries or dividing lines. For a well-designed report this is no different.

To help you design your reports in a well-designed fashion, the report designer comes with the “View->Snap to Elements” feature.

To clean up a report, I usually start by aligning elements that are sitting close together. Visually, it makes no difference whether a element starts at x=24 or x=24.548. For the reporting engine, this makes a difference, as a dumb little engine cannot decide whether the user was just lazy or had a very good reason to have a cell at exactly these positions (or whether some visual design would break by attempting to blindly fix it). 

With the “Snap To Elements” enabled, just select one element and drag the mis-aligned edge until it snaps off its current position. Then move it back into position. This time it will snap to one of the other elements. If your edges are very close, I drag the current edge towards the top (for the y-axis) or the left (x-axis) until it leaves the crowded area. When I return with it, it will snap to the first (top-most or left-most) edge in the group of elements by default. Repeat that with all elements that sit near the edge and very soon you should only see one thin line indicating a perfect alignment.

Additionally, you can also change the element’s position and size to a integer number in the “style” table on the right-hand side of the Report Designer. When you do that for all elements, your alignment task will become a lot easier. Now elements are either aligned or at least one point apart (and the mis-alignment is easier to spot).

The quickly cleaned up version of my sample report now has only 24 columns and 16 rows, but visually, you cannot tell the difference between the two of them. Theoretically, the resulting table can have 384 cells, compared to the mis-aligned report a reduction to a quarter of the original 1225 cells. And finally, the generated HTML file shrunk to a size of mere 8,853 bytes, one third of the original size. In my experience and with those numbers in mind the computing time for this optimized report should be roughly 10% to 15% better than the optimized version. In addition to that slight boost, your report will download faster and rendering the report in the browser will be a lot quicker as well.  

So remember, performance optimization starts in your report designer: When you optimize your report designs it instantly pays off with quicker rendering and smaller downloads.

Further optimization

That report uses line-elements to draw borders around the statistic sections. By using sub-bands with border definitions, that report could be simplified further, but I’ll leave that as an exercise for the class.

This entry was posted in Basic Topic, Performance on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.