Me, Captain Ahab, and a White Whale

Nearly two weeks it has been since I’ve last seen a safe harbor. Long since it has been that we’ve tasted the bitter taste of ale or the sweet kisses of a gentle maid. Yet still, I cannot rest. Not as long as I know that this nemesis is out there. Oh, ye’ old evil, ye’ attacked myself and my ship. Every time I venture out you sit there, waiting for the right moment to strike again. Nobody in those shabby harbor inns believed the stories I told. Oh, I warned them, and all they gave was laughter. The souls of those poor sailors who were slaughtered by that ravish beast now have a hard time laughing, slain as they are when they met what they did not want to believe in. And so night after night I stand here, watching, waiting, hunting. Know ye’ old beast of the deep, I come for you.

You might guessed it from the gloomy wording: It’s bug hunting season.

One of my most primitive list-reports showed an anomaly. The report itself is simple – a static page header, and couple of elements in the itemband. No groups. No images. No expressions. No conditional formatting. Nothing.

The report works fine. It does not crash. It does not slow down. Everything’s fine. Don’t worry. Move on. There’s nothing to see here.

Yet, when executed with a couple of thousand rows resulting in a few hundred pages, one or two pages will show ‘rendering artifacts’. Sometimes the bottom of the page contains text that is supposed to be on the next page (where it appears as well, so no data loss – that would be to easy). Or the partial content appears on the top of the page, duplicating the last line of the previous page. And sometimes the whole layout is just one pixel shifted (which you wont notice until you flicker through the pages).

So there is a bug in the layouter – surprise, surprise.

The layouting is the most complex part of the reporting process. Messing around with data is easy, especially in the reporting field. You have a table, you walk over it. Simple, predictable, fast. But layouting is different. It starts with building a layout model that contains all the visual information, continues with doing all kind of transformations while merging the incomming text with the available font-metrics from the system. Then that stuff is dropped on a large canvas and rearranged, shifted, squeezed, cut, glued together, and finally positioned so that it can be printed. The layout model is a (theoretically) predictable state machine. Its just the fact that the model contains so many variables (which more or less all dependend on each other) and the small detail, that the model evolves as new nodes get added and finished nodes get removed that makes it complex enough to drive grown men insane.

This is the ocean where the white whale hides.

Sometimes innocent actions then have severe results. Two weeks ago I noticed a boolean check that was wrong, which prevented nodes from being recognized as empty. Empty nodes that have no effect on the final layout can be removed without affecting the output. Thats the theory. Fixing that bug, immediately enabled some previously not fully working caches, which then caused OutOfMemoryExceptions, as these caches exploded. That explosion just hid the crashing sound of the alignment/line breaking code that also had a (non-related) bug in it which became obvious when the caching started working. In the meantime a sneaky cache-key sat on a helper layout-box where it was not supposed to be. This was the?/a reason for the layout-bug that started this journey. While traveling the sea of code, each day a new wave of smaller monsters crawled on the deck, trying to slow down the our approach.

They won’t succeed. Each bug we find, makes us stronger and 0.8.11 more stable. The new layouting and processing capabilities added in the latest codeline now demand souls to quell their hunger.Some of the bugs are as old as 0.8.9, so I’m doing backports for those bugs that apply to 0.8.9 and 0.8.10.

So after two weeks hunting, I still holding guard. The monster knows it can’t escape this time. And when we return to the shore, this code will be a safer place.

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.