Monthly Archives: February 2011

What’s New in Pentaho Report Designer 3.8 – Report and Data Caching

When you download the latest build (3.8-branch) of the Pentaho Report Designer and the BI-Server, you may* notice some lovely changes. Its not the look, its not the feel. Its the speed, baby!

Before 3.8, each time parameter changed and the report had to re-render, we had to generate the report twice. One run happened when asking for the parameter-document for displaying the number of pages in the page-selector in the parameter-viewer. And then a second run happened to get you that fancy single page report output.

If your report needed a minute to run, you had to wait two minutes for your results. Now, unless you are in the business of selling coffee to co-workers waiting for reports, this is not good.

With the upcoming 3.8-release (end of February, I have been told), I introduced two measures to make working with reports faster.

The Server Side Report Cache

From now on the report that has been paginated is held in a per-user cache on the user’s session for some time. This makes sure that the paginated report object survives the time between the parameter request (where pagination happens) and the actual content-rendering request. If we can render from a paginated report, we have the report’s result-set available and have the report pre-processed to already contain the page information. At that point, producing the content is fast and simple.

Expect to have the same report browsing speed in the BI-Server as you have in the report preview inside the report-designer.

And of course: If you don’t like caching for some or all of your reports, you can disable the cache either globally (via the pentahoObjects.xml file) or on a per-report basis (via an attribute on the master-report object).

I apologize to all upcoming coffee entrepreneurs who I put out of business with this move. Put the blame of the next recession on me, I can handle it.

The Data Cache

From time to time I get complaints that validating the parameters is slow too.

Usually I hear that in conjunction with list-parameters driven by database queries. Parameter queries are supposed to return fairly small result-sets. In the classical business intelligence world, these parameter are driven by dimension tables. In a datawarehouse, dimension tables are reasonable small (compared to the size of the fact table) and hopefully have a reasonable index on them.

But not everyone believes in that. Some do things different ..

And thus I see parameter being driven by SQL statements as verbose as works of Shakespeare, with nasty constructs and pure evilness dripping out of the query code.

We can rule out to educate our users, for one single reason: Reporting is not Business Intelligence.

Reporting is getting your numbers on paper and when people start with that, they do not think about “Business Intelligence”. They think about getting their report done so that they can go home. They surely do “Business Intelligence” in the process, but they don’t see that. Writing Mondrian-Schema files and doing Data Integration is “Business Intelligence”. This is what the incomprehensible geeks from the IT department are for. But a business user, a analyst or a office assistant could not care less about that. If it works, its fine. If it works badly, it is still fine, as I can go home once it is done.

So I ventured out to find a technical solution to a organizational and people problem.

The Data Cache!

The data-cache is a global cache that holds cached result-sets in memory. Right now, we simply ship with some reasonable default implementations.

Within the report-designer, this is a global cache, holding copies of the result-sets. To make sure we do not run out of memory this cache is limited to result-sets with less than 10.000** rows of data and will cache up to 50** result-sets with a last-recently-used strategy.

It will not cache your latest 4 million rows full-table scan. If you need that to be faster, talk to one of the many Pentaho Partners for getting a proper datawarehouse set up. It will not cache result-sets coming from a scripting datasource, a Java method call, a table-datasource, a external datasource (results computed in a xaction) or a CDA datasource. In all of those cases there is either no point in caching (as caching is more expensive than producing) or because we have not enough hints on the involved query-parameters.

When the cache kicks in, parameter validation with abusive queries will be a lot faster. For everyone else, working with the data-tab in the Pentaho Report Designer will be smoother now. There is no coffee break penalty for adding expressions anymore. The only time queries are fired now are at the report start-up and when you edit either the report’s active query or the connection information of your currently active data-source.

Now there are a truck-load of good reasons to head for the upcoming 3.8-release. For now: Happy reporting!

*) Unless you are using steel-wheels driven reports from the in-memory database where response times are infinitely fast.

**) These numbers can be changed in the global report configuration (classic-engine.properties)