Category Archives: Tech-Tips

Mondrian, XML/A and the Olap4J datasource

When we started with Pentaho Reporting 3.5.0, one of our goals was to kick out as many hard-coded dependencies as possible. Coding against private APIs is ugly, every upgrade breaks something and you always chase your own tail. In short: It is not fun.

The OLAP4J datasources are among the more promising new APIs that emerge. Originally, I had the hope to completely kick out every Java source file that contained the words “import mondrian.”. So far I am still waiting to pull the plug from the Mondrian datasource to make OLAP4J the primary OLAP datasource. Hurry up, sticking with 0.x versions for years and years is soooooo lame. Where do you guys pick up such bad behaviour?

Right now, the only viable use case for using OLAP4J is to access XML/A Cubes. Sounds not that exciting, but with it the Pentaho Reporting System can talk to SAP BW, Palo and Microsoft’s Analysis server.

How do you do connect Pentaho Reporting to a XML/A Datasource?

Create a OLAP4J datasource in PRD
Create a new connection definition. The Database dialog that pops up is full of JDBC databases. Ignore all of that useless gibberish and select the “Generic database”.

Use the following entries in the fields:

Custom Connection URL: jdbc:xmla:Server=http://localhost:4242;Catalog=FoodMart;Cube=Budget
Custom Driver Class Name: org.olap4j.driver.xmla.XmlaOlap4jDriver
User: yourUser
Password: yourPassword

Test the connection and once everything is fine, hit OK.

You are now set up to run MDX queries and to use the result in a report.

But Thomas, there are so many OLAP4J datasources? Which one shall I use?

Pentaho Reporting comes with three flavours of OLAP datasources.

The Banded Datasource is the one that most reports should use these days. This datasource provides a flattened view over the cube much like the results you see in Analyzer and JPivot. You can easily print that resulting dataset using the classical banded reporting techniques, like relational groups and detail bands.

The Denormalized Datasource provides a denormalized stream of data. It will be used for the crosstabbing in 4.0, but right now in the majority of cases, it is of not much use. It has its merrits if you need to do your own aggregation on that data.

The Legacy-Datasource behaves mostly like the Banded datasource. It exists to support PRD-3.0 and older reports. Ignore that, you are fresh and modern, after all. The Legacy-datasource names columns differently and cannot handle ragged hierarchies. Some ancient reports require that kind of weird behaviour to continue to work as before.

So use the Banded Datasource and you cannot go wrong.

Can I use OLAP4J datasources in the BI-Server?

Sure. Just do it. But stay clear from declaring them in JNDI, as connection pooling can easily fail. OLAP4J drivers cloak as JDBC drivers to try to make use of connection pools and JNDI systems. For code that was written for JDK 1.5 or earlier, “unwrapping” the driver does not work and then the pooled JDBC objects cannot be used to make a OLAP connection.

I read I can use Mondrian via OLAP4J as well. How?

Although it is technically possible to use Mondrian this way, I heavily recommend against it.

Mondrian encodes a system specific file name into its URL. When Mondrian tries to establish a connection, it tries to read this file. If the file name is a absolute file name, then you will run into troubles when you move your report onto the BI-Server. The referenced file will usually not be available there. If the file name is relative, then Mondrian will resolve the file using the application’s current working directory (for the BI-Server usually Tomcat’s ‘bin’ directory, for PRD either the PRD installation directory or your home directory).

So far I have not seen a safe way to specify the schema file so that it works both locally and when the report is published on a server.

How to upgrade your old BI-Server to Pentaho Reporting 3.7.0

So once again I find myself writing a upgrade guide for the ancient BI-Server releases. It may be futile to say so, but: Hey, this is the last time I am back-porting anything for these dusty creaky collections of bits and bytes.

The next release of Pentaho Reporting will be the 4.0 release, with APIs changing, worlds ending and other mind-altering influences happening. And I surely don’t have the patience to wait until the zombies crumble to dust. So be warned, your clock is ticking too.

Enough of the happy talk, lets talk upgrade.

As a result of the upgrade you will:

have the latest Pentaho Reporting Engine running.
be able to use the Report Designer 3.7
have it a lot easier to design reports with Index- and Table-Of-Contents sections
have DrillLinking working for PRPT files
make use of the upgraded Report-Viewer and its improved parameter-handling
have the powers to design better charts with Image-Maps
get to your data faster with the new formula functions
enjoy sending reports in the Excel-2007 format.

There are a few limitations and warnings that come with this upgrade:

The Drill-Linking-UI will not auto-populate the parameter information for XActions or XAnalyzer content.
The report-viewer will be upgraded. If you are using invalid report parameter or if you are using report-parameter values that are not acceptable with the parameter-definitions defined in your report, those reports will now always fail. Fix the report definition and the error will go away.
Any modification made to the report-viewer’s plugin.xml file will be lost.
There is no and will be no upgrade for BI-Server 3.5.0 or earlier versions. Upgrade!
The Pentaho-Support-Crew may or may not support such upgraded installations.

For the upgrade you will need:

A computer with a Java-5 SDK and Apache ANT installed
A Subversion client and knowledge how to check out code from a repository
A Pentaho installation on your local disk (and permission to change files there)

This upgrade is relatively easy, and surely easier than the last one. So lets get started.

Step 0: Backup your Pentaho installatio

Don’t skip that part. The upgrade will delete the old reporting files and will install new ones. Always make sure that you can return to the pre-upgrade state.

Step 1: Create a directory for the source-code and the build files.

mkdir /home/user/pentaho

Step 2: Check out the source code of the reporting plugin that matches your BI-Server installation.

In this manual, I will use a command line client. However, any Subversion client will do the job.

BI-Server 3.5.2:

cd /home/user/pentaho
svn co http://source.pentaho.org/svnroot/platform-plugins/reporting-plugin/branches/3.7-backport-3.5.2/ reporting-plugin

BI-Server 3.6.0:

cd /home/user/pentaho


svn co http://source.pentaho.org/svnroot/platform-plugins/reporting-plugin/branches/3.7-backport-3.6.0/ reporting-plugin

Step 3: Edit the build.properties file to point to your BI-Server installation

There are two possible scenarios, depending on how you installed your Pentaho BI-Server:

(1) BI-server pre-configured installation

This is the standard installation type when you install the Pentaho-BI-Server with an EE-installer or from a ZIP or Tgz file from Sourceforge. You only need to modify a single setting:

install.target-dir=

Example:

install.target-dir=/opt/pentaho-bi-server

(2) A custom installation in a J2EE Web-application server.

The installation process has to delete some old files and copies a couple of new files into the web-application. You have to make sure that the WAR-file is extracted somewhere for the upgrade. WAR files are ZIP files so any unzip tool will do the job.

You will have to set two settings in the build.properties file. Note that there is no entry for “install.lib-dir” in that file. You have to create one on a new line.

install.target-dir=
install.lib-dir=

Example:
For this example I will assume that your pentaho-solutions directory is “/srv/pentaho/pentaho-solutions”. I further assume your unzipped WAR file sits in “/home/user/pentaho-war” so that you can see a directory called “/home/user/pentaho-war/WEB-INF/lib”.

install.target-dir=/srv/pentaho
install.lib-dir=/home/user/pentaho-war/WEB-INF/lib

Step 4: Download the libs and build the jars

Assuming you have Apache Ant installed, you can run

ant resolve dist

Otherwise, shame on you for skipping the requirements and then read: http://ant.apache.org/manual/install.html

Step 5: Ensure you have made your backup.

Step 6: If you patch a Pre-Configured Installation or an other J2EE server’s work-directory then stop your BI-Server now.

Step 7: Install the artifacts

The ant-script contains all information to patch your WAR file or your Pentaho-PreConfigured Installation. Simply run:

ant install

Step 8: Start your BI-Server. Login and enjoy your new reporting engine.

Demos and other Material on Pentaho Reporting

There are a couple of recording and documents out there to give you a good start in using our reporting tools. The only drawback about them is: You have to know that they exist. So let me list a few of these resources here:

A report-designer tutorial, showing how to get started with the Pentaho Report Designer and the built in Report Design Wizard.
Kurtis Cruzada explains the Drill-Linking.
Michael Tarallo teaches how to build Guided Adhoc reporting solutions. This covers how to use Parameter, Advanced Datasources and computed queries. At the end you have learned how to implement user-controlled sorting in Pentaho Reporting.
Along the same line of thought: Michael Tarallo shows how to build a simple What-If analysis report.
David Kincade showcases the Pentaho Reporting SDK and the walks you through the samples provided in the SDK.
Our Wiki lists a couple of excellent blog-posts as well. The Wiki will get a overhaul in a few weeks time to update its content to the latest release (and to get rid of the old PRD-3.0 and earlier documentation that is misleading these days).
and last but not least: Remember there is a book out there that covers all the details: Will Gorman’s Pentaho Reporting 3.5 for Java Developers

Supporting the old population – BI-Server 3.5.2 and 3.6.0

Now that the end of the traditional calendar year is near and Samhain is fast approaching, it is time to take care of the dead corpses of the old and existing production installations out there.

Personally, I am a big fan of instant upgrades. As soon as a new release is out, upgrade. Agreed, I am biased as every old installation means I have extra trouble with support and bug reports. Every old installation gone means less work for me. Isn’t that cool?

But there are those boring people with arguments like “cant upgrade a tested installation that is in production” and “never change a system that already works”. As those people also fall into the category of being “honourable Pentaho customers”, I quell the urge to scream “Get over it, what’s live without a bit of risk and fun”.

Pentaho Reporting 3.7.0 release ante portas

So here is my personal take on that problem. Note that my preference does not affect your support plans or dictates what Pentaho as company can and will do or can and will support. It just affects what I do in regard to feature and support requests that come to me via the open channels like the forum, e-mail or personally addressed blackmail.

This release will be out somewhere this year. Note how carefully I avoid any specific dates. It will be released when it is finished or when our Sales and Marketing team grow tired of waiting and invade the developer dungeon to get hold of the treasure.

Pentaho Reporting 3.7.0 will resurrect the dead BI-Server corpses

After that release I will branch the 3.7 codeline to retrofit it into the BI-Server 3.5.2 and 3.6.0 codebase. At the end of that process we will be able to upgrade the existing BI-Server 3.5.2 and 3.6 installations with the new reporting engine.

The server will still be mainly 3.5.2 code, as I do not plan to touch any of the BI-Server’s core projects. The “reporting-plugin” and the “pentaho-reporting-classic-engine” will be the only things that change.

The automatic parameter discovering in PRD for Drill-Linking will only be supported for links to reports. XActions and Analyzer do not support the correct parameter format and I dont feel tempted to mess with their sources (and potentially break things on the way). However, when you know the parameter names, you can always add the parameter to the “Manual Parameter” section of your Drill-down definition.

All other reporting related features will work as if you had a BI-Server 3.7.0 release.

Let the dead corpses rest in peace: No support from me for 0.8.9 and 0.8.10

I will make no effort whatsoever to support BI-Server 3.5.0 or anything that uses the old 0.8.something reporting engines. That code is so old that even my local necromancer would have trouble getting that back to life. I ended all support for that some time ago and although I happily answer questions from time to time, I will not open up that code in my shiny and clean IDE. If you are still working with that code, consider to upgrade, or be prepared to work on the branch alone.

Performance tuning settings for Pentaho Reporting

General thoughts on report processing and performance

Performance of PR is mainly dependent on the amount of content printed.
The more content you generate, the more time we will need to perform
all layout computations.

Use Inline Subreports with care

Large Inline-subreports are the most notorious reason for bad performance.
The layouted output of a inline-subreport is always stored in memory. Layouting
pauses until the subreport is fully generated and then inserted into the layout
model and subsequently printed. Memory consumption for this layouting
model is high, as the full layout model is kept in memory until the
report is finished. If the amount of content of the subreport is huge,
you will run into out-of-memory exceptions in no time.

A inline-subreport that consumes the full width of the root-level band
should be converted into a banded subreport. Banded subreports are
layouted and all output is generated while the subreport is processed.
The memory footprint for that is small as only the active band or the
active page has to be held in memory.

Resource Caching

When images are embedded from servers (HTTP/FTP sources) it is critical
for good performance that the server produces a LastModifiedDate header.
We use that header as part of the caching. A missing header means we
do not cache the resource and will reload the image every time we access it.

As a general rule of thumb: Caching must be configured properly via
a valid EHCache file. If caching is disabled or misconfigured, then
we will run into performance trouble when loading reports and resources.

Performance Considerations for Output types

Within PR there are three output types, each with its own memory and CPU
consumption characteristics.

(1) Pageable Outputs

A pageable report generates a stream of pages. Each page has the same
height, even if the page is not fully filled with content. When a page is filled,
the layouted page will be passed over to the output target to render it
in either a Graphics2D context or a streaming output (PDF, Plaintext,
HTML etc).

Prefer “break-after” over “break-before” pagebreaks.

When the content contains a manual pagebreak, the page will be considered
full. If the pagebreak is a “before-print” break, the break will be
converted to a “after-break” and the internal report states will be rolled
back and parts of the report processing restarts to regenerate the layout
with the new constraints. A similar roll-back happens, if the current
band does not fit on the page.

Stored PageStates: Good for Browsing a report, but eat memory

When processing a pageable report, the reporting engine assumes that
the report will be run in interactive mode. To make browsing through
the pages faster, a number of page-states will be stored to allow us
to restart output processing at that point.

Reports that are run to fully export all pages usually do not need
to store those pagestates. A series of settings controls the number
and frequency of the pagestates stored:

org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=20
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolFrequency=4
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=100
org.pentaho.reporting.engine.classic.core.performance.pagestates.TertiaryPoolFrequency=10

The reporting engine uses three lists to store the page-states. The default
configuration looks as follows:

The first 20 states (Pages 1 to 20) are stored into the primary pool. All states
are stored with strong references and will not be garbage collected.

The next 400 states (pages 21 to 421) are stored into the secondary pool. Of those
every fourth state is stored with a strong reference and cannot
be garbage collected as long as the report processor is open.

All subsequent states (pages > 421) are stored in the tertiary pool
and every tenth state is stored as strong reference.

For a 2000 pages report a total of about 270 states will be stored with strong references.

In server mode, the settings could be cut down to

org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=1
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolFrequency=1
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=1
org.pentaho.reporting.engine.classic.core.performance.pagestates.TertiaryPoolFrequency=100

This reduces the number of states stored for a 2000 page report to 22, thus cutting
the memory consumption for the page states to a 1/10th.

(Note: In PRD 3.7 full exports no longer generate page states and thus these settings
will have no effect on such exports. They still affect the interactive mode.)

(2) Table exports

A table export produces a table output from a fully layouted display model. A
table export cannot handle overlapping elements and therefore has to remove
such elements.

To support the debugging of report layouts, we store a lot of extra information
into the layout model. This increases the memory consumption but makes developing
reporting solutions easier. These debug settings should never be enabled in
prodcution environments. In 3.6 and earlier the pre-built “classic-engine” has them enabled, as this helps inexperienced developers to find their report-definition errors faster.

org.pentaho.reporting.engine.classic.core.modules.output.table.base.ReportCellConflicts=true
org.pentaho.reporting.engine.classic.core.modules.output.table.base.VerboseCellMarkers=true

Note: With PRD-3.7, the defaults for these settings will change to “false” as
we assume that most users use PRD for developing reports now. PRD comes with
its own method to detect overlapping elements and does not rely on these settings.

Special Notes on the HTML export

In HTML exports, there are a few settings that can affect export performance.

org.pentaho.reporting.engine.classic.core.modules.output.table.html.CopyExternalImages=true

This controls whether images linked from HTTP(S) or FTP sources are linked from their
original source or copied (and possibly recoded) into the output directory. The
default is “true” as this ensures that reports always have the same image.

Set to false if the image is dynamically generated and should display the most
recent view.

org.pentaho.reporting.engine.classic.core.modules.output.table.html.InlineStyles=false
org.pentaho.reporting.engine.classic.core.modules.output.table.html.ExternalStyle=true
org.pentaho.reporting.engine.classic.core.modules.output.table.html.ForceBufferedWriting=true

The style settings and the buffered writing settings control how stylesheets are produced and whether the generated HTML output will be held in a buffer until the report processing is finished.

Style information can either be inlined, stored in a external *.css file or
can be contained in the element of the generated HTML file.
(Inlinestyles == false and ExternalStyle == false)

Buffering is forced when styles need to be inserted into the element of a
report. Buffering should be set to true if the resulting report is read by a
browser, as browsers request all resources they find in the HTML stream. If a
browser requests the stylesheet that has not yet been fully generated, the report
cannot display correctly.

It is safe to disable buffering if the styles are inlined, as the browser will
not need to fetch a external stylesheet in that case.

Buffered content will appear slower to the user than non-buffered content, as
browsers render partial HTML pages while data is still received from the server.
Buffering delays that rendering until the report is fully processed on the server.

BI-Server parameter input: An authoritative voice

Over the last few release cycles, we have spent some tremendous
time in making the parameter input easy to use. Along with the
community and user feedback the GWT client received new capabilities
on when and how to pass parameters around.

We strive for a release it quick and release it often approach,
as we rather solve pain in a suboptimal way today than to wait for
a perfect solution that comes next year, if ever. And thus it is a
sad (and rather natural) truth that not all attempts are hitting
the spot.

Especially the auto-submit feature is a creepy little bugger and
has seen numerous hacks, tweaks and finally a full make-over in
the upcoming 3.7 release.

Architecture: the inner workings of the reporting plugin in 3.7

The reporting plugin is a GWT application, and thus when first
invoked, it will initialize itself, inspect all parameters it
received from the URL and fires a non-paginating parameter-request
back to the server.

On the server, the Parameter-content-handler handles the request.
It read all values from the URL. URLs only transmit strings, so
in a second step, those raw values get parsed into Java-Objects.
The parsed values now get passed down into the reporting engine
for further validation. The engine validates the parameters and
either accepts the values as fully valid or spits out a bunch of
error messages.

If all values are valid and the request asked for pagination, we
also paginate the report to return the total number of pages for
the report.

The validation and pagination result, along with a description of
all available parameters and their accepted values are sent back
to the GWT UI.

The first parameter-response now initializes the GWT application
and sets the initial state of the auto-submit feature. If the
GWT UI detects that all parameters have been accepted as
valid, and if auto-submit is enabled, it now fires a full parameter
request that paginates the report.

Subsequent updates will then follow a slightly different protocol:

If auto-submit is disabled, any user-input triggers a lightweight
parameter validation request. This ensures that list-parameters
always get selections based on the actual user input. The response
only updates the parameter definitions and values and at no point
the report gets executed.

When auto-submit is enabled, we always fire heavyweight parameter
requests, which trigger a pagination run on the server. When such
a request returns without error, we do show the report content.
This finally invokes the report-content-generator and creates
the actual report content that is shown below the parameter input
pane.

The BI-Server 3.6.0 Situation

In this version all parameter validation requests were
heavy weight requests. So every parameter validation triggered
a pagination.

A pagination run is only triggered if all parameter were validated
successfully. To turn the heavy weight parameter requests into a
3.7-style lightweight request, you need to add a parameter that
always validates to false, until the user explicitly selects it.
One simple way of doing that is to add a mandatory check-box
parameter with the values “Wait” and “Run” and the post-processing
formula

=IF([p] = "WAIT"; NA(); [p])

that translates all wait into values and thus making the
parameter unvalidatable until the user selected “run”.

Fixed rules: More on Parameter processing

In the famous “lets do it right this time” release of Pentaho Reporting 3.5,
we introduced the ability to have parameter on a report. Well, it wasn’t quite
right, with parameters you need to pre and post process the data to make it
sound. That was release 3.6. At that point, so our theory, you, dear user,
should be happy. But apparently, the beast we created wasn’t all pretty.

Well, its PRD-3.7 now and guess what we are improving: Parameters.

So far, the date parameter processing was not quite right. I still wonder
why after 5 years of XAction no one complained about that. Sure, XActions
only have strings, and any processing or parsing is up to you – and so is the
blame if it does not work.

The various system level options on the parameter UIs was .. sub-optimal.
(I’m getting better at phrasing it more positively, don’t you think?)
The list of supported parameter (aka “the list of parameters you shall not
use in your report”) grew with every release. The Swing UI and the server UI
never quite agreed on what setting to accept and how to behave in border
cases. Thus creating formulas that worked in both settings was a chore.

And last but not least: Validating parameters and getting them run in a
consistent way was difficult. Give a Integer where a Long was expected
and you are screwed. Without a error message. Thus even for me working with
the parameters was more a easter-egg search than sane designing.

And last but not least: Even the parameter processing order was a bit
funny. It works for simple cases, but behaves rather funny for the
not so simple ones.

How Pentaho Reporting Processes Parameters

Each parameter in Pentaho Reporting carries at least two formulas that
eventually need to be evaluated.

The default-value-formula is used to produce a valid value if the user
provided no value.

The post-processing-formula is used to transform the user’s input into
something more usable or simply to validate arbitrary business rules
(a deposit cannot be negative, for instance).

And last but not least, if you reference an other parameter, you expect
it to contain the proper post-processed value.

In PRD-3.6, the order of the validation was largely out of sync with
those expectations. In fact, post processing was done in blocks, so that
parameter were not able to use other post-processed parameter values in
their queries. Now that’s bad, and I guess Gretchen will be able to share
a few unhappy tales in Lisbon about that.

In PRD-3.7, each parameter is now fully processed on its own before
the next defined parameter in the chain is processed.

Lets be more formal for a while:

For each parameter defined:

If the value for the parameter is <null>, we compute the parameter’s
default value and use that as untrusted parameter value. The default-value
formula only sees the previously validated parameters.

In a second step, we post process the parameter to get a trusted value.
The post processing formula sees the previously validated parameters
and the untrusted value. So be careful how you use the untrusted value
here, as you cannot trust users and SQL-injections or cross-site-scripting
troubles are never to far away.

If the post-processing formula fails with an error the
trusted value of the parameter will be <null>, a warning message will be
issued and last but not least we refuse to execute the report. The
parameter processing continues with this value set to <null>.

And finally we check the type of the parameter and compare the parameter
value against the list of valid key-values. If the value passes this test
it becomes a trusted value and will be used in the further parameter
processing and ultimately it will be used in the report.

If the parameter fails the test, we report an error, prevent any report
processing and continue to validate the remaining parameters using the
parameter’s default value.

Beginning with this version, the parameter validation also creates the
set of validated values after the validation is complete. For a report without
any parameter values set, this will yield the default values for all parameters.

So what does this mean for you?

The new schema brings a couple of changes to the way the system behaves.
Default values are now context sensitive and can change when the
selection for the previously declared parameters changes. Our parameter
UIs do not directly use that feature for usability reasons.Automatically changing
the user’s input is not very nice and confuses and/or upsets people. A lot.

The post processing formulas are now executed in a timely manor and before
the default-value or selection for a parameter is computed. This way, you
are now able to compute the mondrian-role array in a hidden parameter’s
post-processing formula and be sure that your datasource sees it.

And last but not least, your formulas wont be able to use values that have
not been validated, nor would the report ever include them. Especially with
the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, this is mandatory.
Your database is yours and we all want to keep it that way.

How to work with Advanced SQL/MDX Datasources efficiently

When creating complex reports or even a Guided AdHoc Report you usually need to modify the structure of the report’s query. And this inevitably means that you use a Advanced SQL or Mondrian/OLAP4J data-source along with a calculated query.

What is a Advanced Data-Source

An Advanced data-source is a data-source that only configures the connection. Instead of configuring the query at design-time, it tries to interpret the report’s query as valid SQL or MDX statement. The value for the query can then be computed at runtime using either a formula or any other expression.

Parameter values are specified via the ${PARAMETERNAME} syntax within the query. Ordinary parametrization in SQL data-sources uses JDBC-PreparedStatements for filling in the parameter values. The JDBC-standard limits this kind of parametrization to pure value replacement, and makes it impossible to replace column names or any other structural element, like adding or changing a ORDER BY clause.

When using a formula to compute the query, those limitations are lifted. Parametrization now happens by doing string concatenation to calculate the SQL statement. This now allows you to inject any SQL fragments into your query at any place and in any fashion you like.

="SELECT * FROM Customers ORDER BY " & [Sorting]

Warning – SQL-Injection: With great powers comes great responsibilities

While this technique is powerful, it is also dangerous: Injecting user-supplied values is commonly known as SQL-Injection and a great way for hackers to enter your server. So make sure that you only use properly validated SQL-fragments.

If you do not validate the input, your users can supply dangerous values for Sorting, like this one:

COLUMN; DELETE FROM TABLE;

which would produce this scary SQL statement

SELECT * FROM Customers ORDER BY COLUMN; DELETE FROM TABLE;

which can delete all rows from your table. (Whether this is successful also depends on your database and JDBC driver. But do you want to take the risk?)

Both the Advanced SQL and Advanced Mondrian/OLAP4J data-sources still allow the Prepared-Statement parametrization in addition to the formula computation. So if you have to parametrize user-supplied values, do it via the classical ${PARAMETERNAME} syntax and only use the formula parametrization with validated parameters.

Designing Reports with Advanced Data-Sources

Inside the Pentaho Report Designer, Advanced SQL-Data-sources do not show field names in the various field pickers. The query is only calculated when the report is run, and thus the design-time environment has not enough information to retrieve data from the data-source. This makes it hard to work with such reports – you always have to remember how your columns are named, which is no fun at all.

But there is a trick! There is always a trick ..

A report can have definitions for both the static value and a formula for the query-attribute at the same time. When the report is executed the formula will be evaluated and the static value will be ignored. At design-time, the Report-Designer only uses the static value and ignores the formula.

Add the Advanced SQL datasource to the report and add the formula for the query-attribute to configure the report’s behaviour at runtime. Then add a standard SQL-Datasource (or just a plain Table-Datasource) to the report, configure it with a suitable query that returns columns with the same names and types as the computed query and set the report’s static query-attribute to the name of that query. You should now see the columns of the standard data-source at design-time and see the data from the calculated query via the Advanced Datasource when the report runs.

Warning: You have to make sure that the Advanced data-source is positioned after the standard data-source or this trick will fail.

How to upgrade Bi-Server 3.5.2 to the latest reporting release

One of the big fat questions lingering in the room with the latest release is (as usual):

Can I upgrade my existing BI-Server installation with the new reporting release?

Short answer: Yes.

With the latest bugfix release of Pentaho Reporting, we also had to upgrade both Kettle and Mondrian to their latest versions to make it run in the BI-Server 3.6.0 release. Due to the massive amount of work that went into Kettle 4.0, many of their APIs changed thus making it impossible to maintain backward compatibility.

Although we would love to see everyone migrate to BI-Server 3.6 immediately, the chances of that happening are fairly slim. Businesses seem to be a bit reluctant to change, once they have everything up and running. Heck, some people still run a 1.x release, which was released three days after the last dinosaurs died.

So how can I upgrade then? What will be the impact of this upgrade?

The upgrade is straight forward and can be done by deleting the following jar-files from the pentaho/WEB-INF/lib directory:

libbase-1.1.5.jar
libdocbundle-1.1.6.jar
libfonts-1.1.5.jar
libformat-1.1.5.jar
libformula-1.1.5.jar
libloader-1.1.5.jar
libpixie-1.1.5.jar
librepository-1.1.5.jar
libserializer-1.1.5.jar
libswing-1.1.5.jar
libxml-1.1.5.jar
pentaho-reporting-engine-classic-core-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-hibernate-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-mondrian-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-olap4j-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-pmd-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-reportdesigner-parser-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-sampledata-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-scripting-3.6.0-GA.jar
pentaho-reporting-engine-classic-extensions-xpath-3.6.0-GA.jar
pentaho-reporting-engine-legacy-charts-3.6.0-GA.jar
pentaho-reporting-engine-legacy-functions-3.6.0-GA.jar
pentaho-reporting-engine-wizard-core-3.6.0-GA.jar

and replacing them with

libbase-1.1.6.jar
libdocbundle-1.1.8.jar
libfonts-1.1.6.jar
libformat-1.1.6.jar
libformula-1.1.7.jar
libformula-ui-1.1.7.jar
libloader-1.1.6.jar
libpixie-1.1.6.jar
librepository-1.1.6.jar
libserializer-1.1.6.jar
libswing-1.1.7.jar
libxml-1.1.7.jar
pentaho-reporting-engine-classic-core-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-hibernate-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-mondrian-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-olap4j-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-pmd-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-reportdesigner-parser-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-sampledata-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-scripting-3.6.1-GA.jar
pentaho-reporting-engine-classic-extensions-xpath-3.6.1-GA.jar
pentaho-reporting-engine-legacy-charts-3.6.1-GA.jar
pentaho-reporting-engine-legacy-functions-3.6.1-GA.jar
pentaho-reporting-engine-wizard-core-3.6.1-GA.jar

Note that the “pentaho-reporting-engine-classic-extensions-kettle” jar remains at version 3.6.0-GA. This ensures that the older Kettle 3.2 is used when running reports with a Kettle datasource.

To make the most of this upgrade, I also recommend to add a few new settings to the “pentaho/WEB-INF/classes/classic-engine.properties” file:

#
# These settings control how pagination states are retained in the reporting
# engine. For the server, it is safe to scale down the number of states to a
# bare minimum. This reduces the memory footprint of reports considerably. 
org.pentaho.reporting.engine.classic.core.performance.pagestates.PrimaryPoolSize=1
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolFrequency=4
org.pentaho.reporting.engine.classic.core.performance.pagestates.SecondaryPoolSize=1
org.pentaho.reporting.engine.classic.core.performance.pagestates.TertiaryPoolFrequency=1000

#
# Disable several assertations and debug-messages, which are cool for testing reports
# but slow down the report processing. You may want to enable them in your test system
# but want to make sure that they are disabled in your production environment.
org.pentaho.reporting.engine.classic.core.layout.ParanoidChecks=false
org.pentaho.reporting.engine.classic.core.modules.output.table.base.DebugReportLayout=false
org.pentaho.reporting.engine.classic.core.modules.output.table.base.ReportCellConflicts=false
org.pentaho.reporting.engine.classic.core.modules.output.table.base.VerboseCellMarkers=false

# Performance monitoring entries. Can generate quite a lot of text in the logs, so 
# keep them disabled for production environments 
org.pentaho.reporting.engine.classic.core.ProfileReportProcessing=false
org.pentaho.reporting.engine.classic.core.performance.LogPageProgress=false
org.pentaho.reporting.engine.classic.core.performance.LogLevelProgress=false
org.pentaho.reporting.engine.classic.core.performance.LogRowProgress=false

With both the upgrade of the libraries and the new configuration settings, you should see a good performance boost.

What is that ‘drill-linking’?

On Pentaho’s roadmap shown earlier this year, the first item for Pentaho Reporting was a mysterious bullet point called “Drill-Linking” with no further explanation on the slides on what this actually means.

Drill-linking is, very generally speaking, the ability to connect reports with other reports or more general any web-based systems via hyperlinks.

At the moment, drill-linking can be done by adding a style-expression on the href-style-key on any element. There is no big magic to it and it is (in most cases) reasonably simple as long as you know where you link to.

As this is the only roadmap feature for the upcoming 3.7 release, there must be something more to it. (After all, you dont make a release out of nothing, don’t you? And it would be Dilbert-style weasel work if we do. But then again, its product management and marketing – and thus
everything that is not outright illegal is probably allowed (and has been done somewhere in the past).)

So what is our idea of drill-linking?

Foremost it is a new method of computing URLs for using it in the href style-key. For that I created a new formula function called “DRILLDOWN”, but more on that later.

It also includes work on the platform components, so that we can ask published reports, XActions, Dashboards and so on for their expected parameter set, so that we can provide a sensible UI for the editor.

And last but not least, I’m keeping an age-old promise and extend the JFreeChart-expressions to generate image-maps (and thus allow to put link on elements in the chart-plot, so that you can click on the pie slice to jump to a more detailed report that tells you all about that specific country/product/whatever.

DRILLDOWN function? What’s that?

The DRILLDOWN function is a declarative way to define URLs. Whenever we want to build an user-interface for a feature, it is most helpful if the values we want to edit exist in a well-defined data structure instead of being hidden like Easter-eggs in a programming language.

(This is the same argumentation we had about XActions vs. PRPT, so if you know the reasons for having a cleanly editable model, you can skip the next parts. But as I still – from time to time – get those questions on why we had to use our own datasources when the XAction system already
worked in the platform, I will go through the same arguments for the DRILLDOWN as well. At least I can point everyone to this post and wont have to waste my breath in repeating myself again and again.)

Up until now, URLs are computed via a formula function that most of the time looks something like this:

="http://localhost/pentaho/content/reporting?solution=steel-wheels&path=&name=report.prpt¶meter="  [value]

or it looks like one of these

= [::env::baseURL] &  "/content/reporting?solution=steel-wheels&path=&name=report.prpt¶meter="  [value]

= [::env::baseURL] & "/content/reporting?" &  MESSAGE("solution={0}&path={1}&name={2}¶meter="{3}";  "steel-wheels"; ""; "report.prpt"; [value])

This stuff is easy and obviously works. But the problem is – as so often – in the details.

I can some up with a million ways of generating the same URL with different formulas. These formulas can be simple, can be complex, but they wont be easily parseable so that I can extract the values for the UI in a automated way. And whenever we fail to extract the values correctly, we will be beaten up by the users for providing a buggy system, or a bad user experience.

We _could_ workaround that by saying: Dear user, we only parse a very specific format. If you are not following that rule, you are on your own. (In the XAction world, this “workaround” was suggested for our PRD-3.5 as well – only accept XActions that look like the ones generated by PRD. And anyone with non-standard problems would be left in the rain.)

Nah, such a solution is not a solution at all. First, no one reads manuals, where that limitation and the valid format would be documented. I don’t, and I guess you do neither. So the first thing I would receive is bug reports on why this formula is not working in the Drill-Down UI. Bah! I want to spend the rest of my life lazily sitting at the beach, and not answering these reports with “RTFM”.

Parsing a program, and a formula or a XAction is a programm, and predicting what it does and how to change it to do something else is a very interesting research area – but the only research I want to do has to do with beaches at the Irish sea, not with geeky problems.

(Welcome back to everyone who skipped the previous paragraphs.)

So we need something that splits the various parts of a URL into separate fields. So lets look at what fields we have:

Protocol, Host, port, web-application-base-url:
This tells us where the web-application sits.
Content-path:
This points to the reporting plugin, the XAction-servlet and so on. It can also point to a PHP page, if we want to link there.
Parameter:
In the pentaho-world, we have three standard parameter, that are always there: solution, path and name, which (for historical reasons) contain parts of the solution path to which we link to. When linking to a non-Pentaho service, that could be omitted.

And then of course, we have loads of user-defined parameter, that the report, XAction or PHP file could accept.

All parameter are name-value pairs, so a 2D-array seems a natural solution.

So our DRILLDOWN function would look somewhat like this:

DRILLDOWN("http://localhost/pentaho"; "/content/reporting";
{{"solution" | "steel-wheels"} |
{"path" | ""} |
{"name" | "report.prpt" } |
{"parameter" | [value] }})

Not bad for a first throw. Now even a brain-dead monkey could write a UI without having to worry about how to parse complex formulas into a usable editor model.

But there is more to it. First, we know that for reports linking to the same server, we can replace the hardcoded server address with the “env::baseURL” variable to make that report more maintainable.

Next, the “/content/reporting” is only valid for links to PRPT files. For XActions that part looks different (heck, I never can remember the real value!) as does it for dashboard-links.

Boring stuff like that screams to be abstracted away from the unexpecting user.

Every link can be classified into one or more groups. We can ask questions like:

Do you want to link to a local report or a report on an other server?
Do you want to link to a PRPT file, a XAction or something else?

Let me call the combined set of answers to these questions “profiles”. And hey, lets give our profiles some speaking names and lets hide the technical gibberish that so confuses the poor user. Then our drill-linking function can look somewhat like that now:

DRILLDOWN ("remote-prpt"; "http://localhost/pentaho"; {...  parameter-array here ..})
DRILLDOWN ("local-prpt"; " used"; {...  parameter-array here ..})

The various profiles along with the ugly technical stuff are then safely tucked away into a configuration file. Now if we make these profiles (admin-)user configurable, we actually opened up a totally new play ground. Now we can do stuff like

DRILLDOWN ("accounting-prpt"; " used"; {...  parameter-array here ..})

and link to the almost forgotten server in the accounting department and best of all, we dont have to remember the IP or DNS-name. And if they ever move the server, one change in the profile-configuration fixes all the reports.

The profiles would be defined in a configuration file in the reporting engine and (as usual) can be overridden by a local configuration. This way, “local” can take up different meanings depending on where the report runs. Now, if the profile configuration also defines how to produce the URL, it is easy to let knowledgeable admins create their own link-profiles to 3rd party systems.

DRILLDOWN ("legacy-erp"; " used"; {...  parameter-array here ..})

could easily produce a link to your ERP system with the customer number or product id passed down. And again your users would not have to worry about the ten-foot-long URL that is needed to make that happen.

How would the UI know what parameters and fields to offer?

Well, this is a problem I prefer to solve by not solving it. OK, that was to much Zen. So lets talk about the UI for the Drill-Down system.

In its most simple form, we can present it as a set of input fields, each one corresponding to one slot in the DRILLDOWN function, and a plain table representing the parameter name/value array.

Its generic, its simple, but its just not sufficient.

For Pentaho-links, we have to maintain the three special parameter (solution, path and name) internally. I prefer to give users a file-system like view with a single path to enter. After all, we are long past since the DOS-driveletter system (ok, most of us are). So let’s make that thing one input and let the UI code figure out how to split that into the three parameters.

For third party links, we have to keep this a single path.

For remote links, we need to show the host-URL inputs, while for local ones this would be dead-confusing.

For PRPTs we probably want to show the system level properties as well. Not everyone want to paginate, and not everyone wants HTML as output.

So in short – thinking of a single unified UI makes my brain hurt. And whenever that happens, I delegate (in the worst sense of the word) my problems to someone else. The someone else happens to be you.

In my book, a pluggable profile system needs a pluggable UI as well. We provide sensible UIs for the profiles we define, and a rather generic one for everything else (some sort of fall-back, in case you or your admin is as lazy as I am).

If you have a complex URL building requirement, then at least you can provide your own UI. If you’re happy with the ones we provide – fine, then we all can spend our time drinking beer instead of coding. And if the generic solution is good enough to cover common linking problems well, then only system integrators will have to worry about actually writing UI-plugins. But if your users start to nag, and after playing World of Warcraft on the servers got boring, then you have an easy way to provide them the slick made-to-measure UI they always begged for.