How to work with Advanced SQL/MDX Datasources efficiently

When creating complex reports or even a Guided AdHoc Report you usually need to modify the structure of the report’s query. And this inevitably means that you use a Advanced SQL or Mondrian/OLAP4J data-source along with a calculated query.

What is a Advanced Data-Source

An Advanced data-source is a data-source that only configures the connection. Instead of configuring the query at design-time, it tries to interpret the report’s query as valid SQL or MDX statement. The value for the query can then be computed at runtime using either a formula or any other expression.

Parameter values are specified via the ${PARAMETERNAME} syntax within the query. Ordinary parametrization in SQL data-sources uses JDBC-PreparedStatements for filling in the parameter values. The JDBC-standard limits this kind of parametrization to pure value replacement, and makes it impossible to replace column names or any other structural element, like adding or changing a ORDER BY clause.

When using a formula to compute the query, those limitations are lifted. Parametrization now happens by doing string concatenation to calculate the SQL statement. This now allows you to inject any SQL fragments into your query at any place and in any fashion you like.

="SELECT * FROM Customers ORDER BY " & [Sorting]

Warning – SQL-Injection: With great powers comes great responsibilities

While this technique is powerful, it is also dangerous: Injecting user-supplied values is commonly known as SQL-Injection and a great way for hackers to enter your server. So make sure that you only use properly validated SQL-fragments.

If you do not validate the input, your users can supply dangerous values for Sorting, like this one:

COLUMN; DELETE FROM TABLE;

which would produce this scary SQL statement

SELECT * FROM Customers ORDER BY COLUMN; DELETE FROM TABLE;

which can delete all rows from your table. (Whether this is successful also depends on your database and JDBC driver. But do you want to take the risk?)

Both the Advanced SQL and Advanced Mondrian/OLAP4J data-sources still allow the Prepared-Statement parametrization in addition to the formula computation. So if you have to parametrize user-supplied values, do it via the classical ${PARAMETERNAME} syntax and only use the formula parametrization with validated parameters.

Designing Reports with Advanced Data-Sources

Inside the Pentaho Report Designer, Advanced SQL-Data-sources do not show field names in the various field pickers. The query is only calculated when the report is run, and thus the design-time environment has not enough information to retrieve data from the data-source. This makes it hard to work with such reports – you always have to remember how your columns are named, which is no fun at all.

But there is a trick! There is always a trick ..

A report can have definitions for both the static value and a formula for the query-attribute at the same time. When the report is executed the formula will be evaluated and the static value will be ignored. At design-time, the Report-Designer only uses the static value and ignores the formula.

Add the Advanced SQL datasource to the report and add the formula for the query-attribute to configure the report’s behaviour at runtime. Then add a standard SQL-Datasource (or just a plain Table-Datasource) to the report, configure it with a suitable query that returns columns with the same names and types as the computed query and set the report’s static query-attribute to the name of that query. You should now see the columns of the standard data-source at design-time and see the data from the calculated query via the Advanced Datasource when the report runs.

Warning: You have to make sure that the Advanced data-source is positioned after the standard data-source or this trick will fail.

This entry was posted in Advanced Topic, Tech-Tips on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.