Using Queries in Formulas in Pentaho Report Designer

One of the most powerful features of Pentaho Reporting is our ability to combine data from a multitude of datasources into a single report.

A report can be driven by a a query that reads an XML file. The results then parametrize a Web-Service. The web-service response then gets fed into a standard SQL query. And that finally drives the queries to a OLAP cube.

Sure, that example is a bit extreme, but possible out-of-the-box with Pentaho Reporting downloaded from SourceForge.

But we can do more: In Pentaho Reporting 3.8, you can also use two formula functions to query report datasources to incorporate the query-result into your report.

The SINGLEVALUEQUERY function retrieves a single values from the resultset. It always reads it from the first row of the result-set, either from the given column or simply taking the first column it finds.

SINGLEVALUEQUERY()
SINGLEVALUEQUERY(; )
SINGLEVALUEQUERY(; ; )

As you can see, both column and query-timeout are optional parameter. The “query” given must be a valid report query – but more on that later.

The MULTIVALUEQUERY function works similar to the SINGLEVALUEQUERY formula function. But instead of retrieving just one value, it returns all values as an list.

In addition to the parameters of the SINGLEVALUEQUERY formula function, it also accepts a “query-limit” Integer-parameter, that restricts the number of values returned.

MULTIVALUEQUERY()
MULTIVALUEQUERY(; )
MULTIVALUEQUERY(; ; )
MULTIVALUEQUERY(; ; ;  )

So what is that ominous “query” parameter about?

In Pentaho Reporting, most data sources use logical names (or handles) to reference queries. Queries can be quite complex. SQL and MDX queries can be huge, and seeing queries printed out that cover several pages is rather normal. They are complex beasts that were difficult to write and even more difficult to keep free of errors. You don’t want to seem them being spread over throughout the reports.

All except for the “Advanced/Custom” data-sources hide these monsters behind easy to remember names. This allows us to encapsulate the complexities of the actual queries while at the same time documenting the purpose of the query itself. It is just easier to use “Sales-Persons-By-Region” in a report than to repeat the SQL statement everywhere.

Each report holds the logical name in its query attribute. When the query needs to be executed the reporting engine checks all declared data-sources to find one that recognizes the handle and that then returns a result-set for the assigned query.

For the formula system, I followed the same approach. It is easier to maintain formulas that use

SINGLEVALUEQUERY("sales-top-performer")

than to use

SINGLEVALUEQUERY("SELECT name, Sum(Sales) AS 'Sales' FROM SalesData ORDER BY 'Sales' LIMIT 1")

especially if that formula is going to be used several times in the report.

As a side note: As with calculated queries in the report itself, you can add a Advanced/Custom data source to compute a query at runtime. At that point, the second query would be valid.

These are the steps needed to use a SINGLEVALUEQUERY or MULTIVALUEQUERY formula function in a report for a given query.

(1) Define the data-source
(2) Add a query, test that it gets your expected result back.
(3) Reference the query in the formula

How do I parametrize the SINGLEVALUEQUERY or MULTIVALUEQUERY formula function?

For moth formula functions, parametrization happens implicitly. At runtime the reporting engine passes the current data-row, including all report parameter, the values from the report’s current result-set and the calculated function/expression values to the data-factory that executes the query.

Queries need to get parametrized as usual. For instance for SQL use the ${param} syntax. The underlying data-source itself specifies the names of the data-row columns it wants to use as parameters.

Example:

Lets assume that your report contains a field called “CUSTOMERNUMBER” and you want to retrieve the accumulated sales for the user via a SINGLEVALUEQUERY function.

(1) Define the SQL datasource on your report (or reuse one you already have defined).
(2) Add the query, name it “Sales-For-Customer” and provide the following SQL:

SELECT
     SUM("ORDERDETAILS"."QUANTITYORDERED" * 
     "ORDERDETAILS"."PRICEEACH") AS "Sales"
FROM
     "ORDERS" INNER JOIN "ORDERDETAILS" ON "ORDERS"."ORDERNUMBER" = "ORDERDETAILS"."ORDERNUMBER"
     INNER JOIN "CUSTOMERS" ON "ORDERS"."CUSTOMERNUMBER" = "CUSTOMERS"."CUSTOMERNUMBER"
WHERE 
     "CUSTOMERS"."CUSTOMERNUMBER" = ${CUSTOMERNUMBER}
GROUP BY 
     "CUSTOMERS"."CUSTOMERNUMBER"     

(3) Add a label, and edit the “value” attribute-expression. Set the following formula

=SINGLEVALUEQUERY("Sales-For-Customer")

Download the sample report

As you can see, there is not much magic in using the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, but the effect they can have is powerful.

PS: Multiple queries with the same parameter values will be cached by the Report Data Cache

7 thoughts on “Using Queries in Formulas in Pentaho Report Designer

  1. Thomas Morgner

    It does (in 3.8, not in the dreaded 3.7) – but the default formula usually sees for all referenced parameter values that have no default value and thus easily fails with an NA-error.

    And of course, a parameter can only reference other parameters and only those parameters that have been defined before the current parameter. You cannot see parameters that come after your current parameter.

    Reply
  2. Madhu

    Hi, If i need to use singlevaluequery similar to an inline subreport, i will have to pass some keys from a field of the main query. Is this possible?

    Reply
  3. Thomas Morgner

    The parameter passing happens implicitly. The queries just have to reference any of the fields currently active in the report. The SINGLEVALUEQUERY function passes the whole report context along when firing the query.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>