Agile without fast tools aint agile: Tuning our performance ..

From time to time, we get some .. extra ordinary .. requests of how our customers use the reporting system. There’s the requirement that we create insanely large reports (400.000 rows resulting in 75.000 pages in a PDF file), or HTML files so large that no browser ever could render them. (Forget about the BORG-virus, just send them one of those files and watch them go down.)

As a general rule, I treat such requests as a nice way to test and optimize the performance of our reporting engine. I focus primarily on making small, well, human readable reports fast. Making the huge ones faster is ok and when I get the chance, I happily optimize that as well. But if I have to choose to make smaller reports slower to make the insane ones slightly faster, then I happily resist the change. After all, if your CPU burns 5 hours or 4 hour does not matter, if you are not going to look at the report 9 hours later. But waiting 10 seconds instead of 20 seconds for a report during your work day surely makes all the difference.

During the last few weeks, I once again had such a case. The customer needed to produce a large scale report, probably just to fulfil some ill-thought-out government regulations. But for some reason, the report constantly failed with OutOfMemoryExceptions.

(Yes, this is the moment where a support contract comes in really handy. 😉 )

Memory management is usually a rather critical issue. For our reporting engine, it is even more critical, as this engine is based on the idea that all reporting problems can be solved in the available memory, without making a mess in your temp-directory. Actually, I’m way to old to believe in the “throw more memory/CPU/disk-space/nodes” myth. If you can solve the problem efficiently in a embedded-systems scenario, you can always scale up. But if you assume everyone has a high-end system, your code probably wont scale down that nicely.

So ok, we are a all-in-memory engine, and I want to keep things like that for a while. Therefore I work with a assumed limitation of 128MB for normal reports (
After digging through the case, running a sample report, I discovered a couple of conditions, where we started to add up memory during the report processing at a rather unhappy rate. During profiling, I also discovered a bunch of non-optimal (polite for: purely crappy) data-structures I introduced years and years ago, which make the problem even worse. Oh, and the customer uses engine version 0.8.9 – not my favourite place to spend my time either.

After loads of tests, loads of profiling, loads of just waiting for results (ye olde MacBook aint that fast), we are now at the happy spot of reporting success.

In 0.8.9 and 3.6.1, this report now runs within the 512MB barrier. It is not lightning fast, but it completes running within 90minutes here, and thus it is fast enough for a nightly batch processing run. (In 0.8.9, the table (HTML, CSV etc) exports needs a lot more memory and thus require access to a full 2GB heap. Luckily that condition had been fixed in the 3.5 codeline.)

In the 3.7 codeline, I eliminated the last few memory hogs and there the same report runs within the 128MB corset. As these changes required some non-trivial API changes, this is nothing I could sanely add to a bug-fix release.

A updated build for the 0.8.9-reporting engine can be found in our Hudson system. Be aware that you also have to replace libfonts with the version supplied here, as it contains other performance fixes (+some API changes) we’ve made earlier on for a different customer.

Hudson job: LEGACY_classic_engine_core_089_bugfix

While working on that issue, PRD-2579 came up. This case reports that report processing has been slower in the 3.5-versions than it has been in the 0.8.9-versions. A bit of investigation turned out that this is indeed the case and that we better fix that before the higher CPU utilization causes more global warming.

The initial tests showed that PDF generation and print(-preview) was about 4 times faster in 3.5 than it was in 0.8.9-10. But HTML export was slow: 10 seconds vs 30 seconds. As I tend to work primarily with the Swing-preview or the PDF exports, I never noticed that part. BI-Server users tend to see more HTML exports than anything else, and there the slowdown matters.

Adding smarter caches solved the slowdown – which was originally caused by the fix for the table-export memory consumption problem in the 0.8.9-problem. In combination with some other performance fixes, our table export rendering speed is (nearly) back to where it was in the old days, while the PDF speed is faster than ever. (And ya can’t complain about a 4x speedup!)

Right now, I’m busy making more bug-fixes for the 3.6.1 release, which is at the moment scheduled for April 22 (.. this year).

Pentaho Reporting 3.7 with the new drill-linking API should be out in the wild within Q2-2010.

As the 3.7-codeline is currently a bit “funny”, you might want to check out the 3.6-branch CI-builds instead.

* Subject to change if I ever get access to the BORG-cluster. You will be assimilated, but I have all the CPU time of the world. 🙂

This entry was posted in Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

2 thoughts on “Agile without fast tools aint agile: Tuning our performance ..

  1. dhartford

    Keep your posts like this – ideology with real-world problem and how it was compared with an old version/new version, and real metric comparisons.

    This is the things that turn you into a legend…if you weren’t already 😉

Comments are closed.