Using Queries in Formulas in Pentaho Report Designer

One of the most powerful features of Pentaho Reporting is our ability to combine data from a multitude of datasources into a single report.

A report can be driven by a a query that reads an XML file. The results then parametrize a Web-Service. The web-service response then gets fed into a standard SQL query. And that finally drives the queries to a OLAP cube.

Sure, that example is a bit extreme, but possible out-of-the-box with Pentaho Reporting downloaded from SourceForge.

But we can do more: In Pentaho Reporting 3.8, you can also use two formula functions to query report datasources to incorporate the query-result into your report.

The SINGLEVALUEQUERY function retrieves a single values from the resultset. It always reads it from the first row of the result-set, either from the given column or simply taking the first column it finds.

SINGLEVALUEQUERY()
SINGLEVALUEQUERY([query:string]; [column:string])
SINGLEVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])

As you can see, both column and query-timeout are optional parameter. The “query” given must be a valid report query – but more on that later.

The MULTIVALUEQUERY function works similar to the SINGLEVALUEQUERY formula function. But instead of retrieving just one value, it returns all values as an list.

In addition to the parameters of the SINGLEVALUEQUERY formula function, it also accepts a “query-limit” Integer-parameter, that restricts the number of values returned.

MULTIVALUEQUERY([query:string])
MULTIVALUEQUERY([query:string]; [column:string])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer]; [limit:integer] )

So what is that ominous “query” parameter about?

In Pentaho Reporting, most data sources use logical names (or handles) to reference queries. Queries can be quite complex. SQL and MDX queries can be huge, and seeing queries printed out that cover several pages is rather normal. They are complex beasts that were difficult to write and even more difficult to keep free of errors. You don’t want to seem them being spread over throughout the reports.

All except for the “Advanced/Custom” data-sources hide these monsters behind easy to remember names. This allows us to encapsulate the complexities of the actual queries while at the same time documenting the purpose of the query itself. It is just easier to use “Sales-Persons-By-Region” in a report than to repeat the SQL statement everywhere.

Each report holds the logical name in its query attribute. When the query needs to be executed the reporting engine checks all declared data-sources to find one that recognizes the handle and that then returns a result-set for the assigned query.

For the formula system, I followed the same approach. It is easier to maintain formulas that use

SINGLEVALUEQUERY("sales-top-performer")

than to use

SINGLEVALUEQUERY("SELECT name, Sum(Sales) AS 'Sales' FROM SalesData ORDER BY 'Sales' LIMIT 1")

especially if that formula is going to be used several times in the report.

As a side note: As with calculated queries in the report itself, you can add a Advanced/Custom data source to compute a query at runtime. At that point, the second query would be valid.

These are the steps needed to use a SINGLEVALUEQUERY or MULTIVALUEQUERY formula function in a report for a given query.

(1) Define the data-source
(2) Add a query, test that it gets your expected result back.
(3) Reference the query in the formula

How do I parametrize the SINGLEVALUEQUERY or MULTIVALUEQUERY formula function?

For moth formula functions, parametrization happens implicitly. At runtime the reporting engine passes the current data-row, including all report parameter, the values from the report’s current result-set and the calculated function/expression values to the data-factory that executes the query.

Queries need to get parametrized as usual. For instance for SQL use the ${param} syntax. The underlying data-source itself specifies the names of the data-row columns it wants to use as parameters.

Example:

Lets assume that your report contains a field called “CUSTOMERNUMBER” and you want to retrieve the accumulated sales for the user via a SINGLEVALUEQUERY function.

(1) Define the SQL datasource on your report (or reuse one you already have defined).
(2) Add the query, name it “Sales-For-Customer” and provide the following SQL:

SELECT
     SUM("ORDERDETAILS"."QUANTITYORDERED" * 
     "ORDERDETAILS"."PRICEEACH") AS "Sales"
FROM
     "ORDERS" INNER JOIN "ORDERDETAILS" ON "ORDERS"."ORDERNUMBER" = "ORDERDETAILS"."ORDERNUMBER"
     INNER JOIN "CUSTOMERS" ON "ORDERS"."CUSTOMERNUMBER" = "CUSTOMERS"."CUSTOMERNUMBER"
WHERE 
     "CUSTOMERS"."CUSTOMERNUMBER" = ${CUSTOMERNUMBER}
GROUP BY 
     "CUSTOMERS"."CUSTOMERNUMBER"     

(3) Add a label, and edit the “value” attribute-expression. Set the following formula

=SINGLEVALUEQUERY("Sales-For-Customer")

Download the sample report

As you can see, there is not much magic in using the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, but the effect they can have is powerful.

PS: Multiple queries with the same parameter values will be cached by the Report Data Cache

This entry was posted in Advanced Topic, BI-Server, Parameter, Tech-Tips on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

9 thoughts on “Using Queries in Formulas in Pentaho Report Designer

  1. Thomas Morgner

    It does (in 3.8, not in the dreaded 3.7) – but the default formula usually sees for all referenced parameter values that have no default value and thus easily fails with an NA-error.

    And of course, a parameter can only reference other parameters and only those parameters that have been defined before the current parameter. You cannot see parameters that come after your current parameter.

  2. Madhu

    Hi, If i need to use singlevaluequery similar to an inline subreport, i will have to pass some keys from a field of the main query. Is this possible?

  3. Thomas Morgner

    The parameter passing happens implicitly. The queries just have to reference any of the fields currently active in the report. The SINGLEVALUEQUERY function passes the whole report context along when firing the query.

  4. David

    Hi!

    I used SINGLEVALUEQUERY in Pentaho Report 5.2 to get record count for a few conditions
    and it works well.

    But when I ran this report in BI-Server, all those reporting fields (Label fields) that uses SINGLEVALUEQUERY becomes blank.

    What did I do wrong. The BI-Server is version 5.01

    Thank you.

    Dave

  5. Dario Bernabeu

    Hi Thomas

    In PRD 5.3: all the parameters of the MULTIVALUEQUERY formula function, should receive values. Now, the parameters are not optional.

    Regards

Comments are closed.