Category Archives: BI-Server

Blazing fast data exports with Pentaho Reporting 5.1

Ever felt that getting the Pentaho BI-server to spit out CSV or Excel files should be faster? As unbelievable as it sounds, with case PRD-4921 closed, we have up to 5 (in words: five!) times faster exports now.

Its one of those occasions where talking about customer problems creates a wacky idea that falls on fruitful ground.

Many customers use the Interactive Reporting tools (Pentaho Interactive Reporting, Saiku Reporting or the ugly Adhoc Reporting (WAQR)) to get data out of the data-warehouse into Excel or CSV files. Such reports are usually rather simple list reports, with no fancy charting or complex layout structures. However, the reporting engine does not know that, it’s pessimistic nature always assumes the worst.

The Pentaho Reporting engine allows an insane degree of freedom, and via custom report functions, it allows to reconfigure the report on the fly while the report is running. But with that freedom, we no longer can make any assumptions about how a report will look like in the next row. Thus the engine, and the layout subsystem, assume nothing and (apart from a bit of caching of reusable bits) recalculate everything from scratch.

Which takes time.

With PRD-4921, I added a fast-mode to some of the export types (Stream CSV, Stream HTML and XLS/XLSX). The new exporters check whether the report uses only ‘safe’ features, and if so, switches to a template based output instead of using the full layouting.

A report is safe if it does not contain any of the following items:

inline subreports. They are evil, as they can appear anywhere and can be of any complexity.
crosstabs. They are a complex layout and can’t be easily condensed into templated sections.
functions that listen for page-events, other than the standard page-function and page-of-pages functions. During fast-mode, we don’t generate page events and thus these wont output correct values. I am willing to ignore page functions, as data exports are less concerned about page numbers.
any layout-processor-function other than the Row-Banding function. They exist to rewrite the report, which stops us from making assumptions about the report’s structure.

If a report is not safe, the engine falls back to the normal, slow mode. You now just have to wait a bit longer to get your data, but you wont get sudden service interruptions.

For fast reports, the engine produce a template of each root-level band. If the style of a band changes over time (as a result of having Style-expressions), we produce a template for each unique combinations of styles the reporting engine encounters.

Once the engine has a valid template, it can skip all layouting on all subsequent rows of data and can just fill in the data into the template’s place holders. The resulting output is exactly the same as the slow output – minus the waiting time.

So how does this system perform? Here is what my system produces using a 65k rows report (to stay within the limits imposed by Excel97) with 35 columns of data exported. The report has no groups, it is just one big fat stream of data. All times are given in seconds.

Export	5.1 with fix	5.0 with fix	5.0 GA
Fast-CSV	4.5	5.4	–
CSV	25.8	24.5	24.8
Fast-XLS	11.7	11.3	–
XLS	53.2	51.3	213.4
Fast-XLSX	31.3	37.7	–
XLSX	86.0	82.8	232.4
Fast-HTML	10.0	11.1	–
HTML	42.9	43.5	44.9
PDF	66.7	69.2	66.4

As you can see from the data, the fix gave a 4 to 5 times speed up for HTML and CSV exports. The Excel exports were extra slow in 5.0 (and 4.x), and a few fixes in the layout handling and Excel specific exports gave the ‘normal’ mode a speed-up of 3 to 4 times. On top of that, we now have the fast mode, that gives another 2-3 times more raw speed.

Not bad for one week of frantic coding, I guess.

Go grab the 5.1 CI builds to give it a go. You will need an updated BI-Server reporting plugin to make the BI-server (and thus the Adhoc reporting tools) pick up that change.

The 5.0 branch does not have those changes in, so don’t even try the CI builds for it. As the 5.0 codeline is in lock-down for bug-fixes only, these performance improvements will take a while to go in, as we don’t want to introduce regressions that break systems in production.

A CDF based parameter viewer

At our community conference in Frascati yesterday I gave a talk on how to replace the old GWT report viewer with a slim CDF based report viewer.

Giving Granny a Face-Lifting

This is the slightly edited full-text version of this talk.

Are you tired of our trusted GWT report viewer

When we introduced Pentaho Reporting 3.5, one of the major new features we added was the ability to run Pentaho Reports directly in the BI-Server without the need for writing or generating XActions. This feature instantly removed the number one headache our users had with reports on the server – the need for an additional runtime file, the XAction. The file contained the same information they already specified in the report designer. But to edit the file later, they would need to go into a totally different editor to do some sort of magical programming. Ordinary business users could and would not do that.

We created the report viewer with Google’s Webtoolkit, which promised an easy way of creating rich JavaScript UIs without having to resort to homegrown libraries.

Where has all the love gone?

There are no simple solutions. GWT turned out to create monolithic code that lived on its own island. GWT applications were nearly impossible to extend for normal (non-GWT-using) web-developers. The code was hard to debug (as normal debugger like Firebug cannot help much to make sense out of the autogenerated code). And GWT was slow. Slow to compile and slow to run (compared to other JavaScript alternatives).

Our partners had no particular love for GWT, and bit by bit we grew tired of it as well. Over time we realized that GWT would not be the silver bullet.

Our report viewer implementation also suffered from a few deficiencies. There is no easy way to create alternative layouts for the parameter UI, the date parameter input is simple and limited and long parameter texts can cause problems.

In the mean time, our consultants and partners worked around these limitations by using CDF to build custom parameter pages for reports.

And then replace the GWT Viewer
… with a normal CDF-Dashboard?

CDF instantly solved several of their problems. In CDF you are free to design your parameter page the way you like it. CDF is by far more flexible than the GWT viewer (as you have simple code with loads of extension points available). CDF is made to create interactive dashboards with a rich user experience.

And CDF comes with a PRPT component, so it already knows how to drive a report.

… but you know you pay the price

But once again: There is no magic silver bullet. Writing an extra CDF file suffers from the same problem that XActions have. Suddenly you have to duplicate the parameter information from the report designer into the dashboard. You have to replicate all parameter dependencies.

CDF requires some technical skill to create a dashboard. Similar to XActions, the report designer cannot read a CDF file for editing (and given that CDF is JavaScript that you can program in any way you want, there is no way we could ever hope to build such an editor). CDFs duplicate the information that is already in the report, and any change to the report parameters must also be applied to the dashboard.

Creating CDF files is not free – your IT department or an external consultant has to do it for the ordinary business user. So from a business point of view, that sort of “premium parameter viewer” would only be feasible for critical reports where you can justify the high development and maintenance costs.

To make CDF work for all reports, we need to solve the “duplication of parameter information” problem.

A simple solution:
Let PRPT’s information drive CDF

Mike D’Amour architected the GWT reporting plugin as a RESTful service. We do intentionally avoided all of the GWT server side libraries to communicate with the server. The report viewer uses standard HTTP-GET or PUT calls to query the server, which responds either with content or XML files.

The report viewer only uses the server’s public URLs to get information about the report’s current parameters. In fact, anyone can call these URLs to get the same information. We do not have any limits on what kind of client you use to interact with the reporting plugin.

Flow, Report Viewer, Flow

The GWT report viewer uses a very simple algorithm to communicate with the server.

First we query the parameter XML by passing all known parameter to the server.
We parse the XML and render the UI
We check whether the server found any problems with the parameter we given. If everything is OK, we ask the server for the report content.
We wait for input from the user.
On any new input or if the user hits submit, we go back to the start and query the parameter-XML again.

You can find more information about this cycle in one of my previous postings.

Action time

Click here to see a simple form-based parameter page. This demonstrates how to communicate with a Pentaho BI-server and shows the basic steps to parametrize an existing report. If you see a login window in the lower frame, then login and restart the demo.

The form itself is simple:

<html>
<head>
  <title>Report Viewer

<body>
<div style="border: 1px solid black; margin-bottom: 20px">
  <h1>Parameter Input
  <hr />
  <form action="http://demo.pentaho.com/pentaho/content/reporting" method="GET" target="viewer">
    <div>
      Report to load:
      <input name="solution" value="steel-wheels"/>
      <input name="path" value="reports"/>
      <input name="name" value="Invoice Statements.prpt"/>
    </div> 
    <h2>System parameter</h2>
  <label for="renderMode">Render Mode</label> 
  <select id="renderMode" name="renderMode" size="1"> 
       <option value="REPORT">REPORT</option> 
       <option value="XML">XML</option> 
  </select>
  <label for="output-target">Output Target</label> 
  <select id="output-target" name="output-target" size="1"> 
     <option value="table/html;page-mode=stream">Single page HTML (table/html;page-mode=stream)</option> 
     <option value="table/html;page-mode=page">Paginated page HTML (table/html;page-mode=stream)</option>
   </select>
  <h2>User parameter</h2>
  <label for="Customer">Customer</label>
  <input type="text" id="Customer" name="CustomerNo" value="242"/> 
  <label for="ReportStamp">Report Stamp</label> 
  <input type="text" id="ReportStamp" name="Report Stamp" value="Review"/>
  <div/> 
  <input type="submit"value="Go!"/>
</form>
  <h1>Report</h1>
  <iframe name="viewer" width="100%" height="50%"/>
</div>
</body>
</html>

In the first section we setup a few system level parameter. The form contains the path to the report we want to render (expressed as Pentaho Standard Triple – solution, path, name), the renderMode (that defines whether we query parameter information (XML) or whether we render the report (REPORT)) and finally the output target that defines what output the server should generate.

This form is already a valid method to supply parameter to the reporting plugin and shows that there is no magic involved.

Now, lets do the same again … in CDF

Jordan Ganoff wrote a prototype of a CDF dashboard reads the parameter information from the Pentaho reporting plugin to construct a dashboard.

Due to some security restriction in the JavaScript execution in browsers (same origin rule) I cannot provide a one-click example. Download the zip file and copy the contents into your BI-Server’s solution directory.

You can then switch to the new parameter viewer by replacing the “reportviewer/report.html” part with “web/reportviewer.html”.

So the original URL for your GWT report viewer

http://localhost:8080/pentaho/content/reporting/reportviewer/report.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

becomes this URL

http://localhost:8080/pentaho/content/reporting/resources/web/reportviewer.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

(Download the CDF based report viewer)

Lets hand over the microphone to Jordan to explain the architecture of this implementation.

Here’s a quick introduction:

The new report viewer is a collection of CDF components. You can follow
the logic starting in reportviewer.html’s load() function. We set up a
div to inject the prompt panel into and then call:

pentaho.common.prompting.createPromptPanel({
          destinationId: "promptPanel", 
          paramDefn: reportViewerParameterLookup(),
          refreshParamDefnCallback: reportViewerParameterLookup,
          extraComponents: [{type: "SubmitReportComponent", htmlObject: 'report-div'}]
});

destinationId: the element Id where the prompt panel will be injected into.

paramDefn: this is the parameter definition (parsed Parameter XML into an object)

refreshParamDefnCallback: function called whenever a parameter has changed its value. For now we will hit the ParameterXmlContentGenerator for a new parameter xml and parse it to a parameter definition every time a parameter value has changed.

extraComponents: Any additional cdf components you’d like initialized. I have a quick prototype component defined in reportviewer.html called “SubmitReportComponent” that will listen for the parameter “submit” to change (which is fired by the submit button on the prompt panel). When this parameter changes the update() method of the SubmitReportComponent is called. We build a valid reporting url and set the iframe’s src to that url to load the report. Pretty straight forward and is exactly how the existing report viewer works today.

Core architecture: the parameter panel itself is a CDF component which defines a layout for all widgets provided. The submit button widget is configured to listen to all CDF components created from any non-hidden parameter. Any time the submit component receives a parameter change event its update() function is called and we check if all parameters are valid. If they are all valid we fire a change event for the parameter “submit” which someone else can listen to and do what they need to do.

Once all components are created we pass them to CDF to initialize them. This will register them CDF and eventually calls update() on all components. It’s this update() that will inject the CDF components into the page.

That being said, the thinking here is that in the future anyone can create a prompting panel from a parameter xml and provide their own callback when the submit button is clicked (or any parameter is changed for that matter).

Extension points:

The mapping for widget types to parameter types is done in: parameter-prompting-builders.js in the object: pentaho.common.prompting.builders.ParameterWidgetBuilder.paramTypeToWidgetMapping. That’s the internal widget mapping that we use when looking up a widget type in pentaho.common.prompting.builders.ParameterWidgetBuilder.lookupCDFWidgetBuilder.

Most of the javascript is structured so it can be extended at any point. Hopefully I provided enough functions that can be overridden to make it easy for a hacker to tear it apart!

One of the items I’m expecting to change is the delegation of widget creation to layout panels. I’d like to pass the parameter definition to the layout panel and let it create any widgets it needs instead of creating them ahead of time for each non-hidden parameter. This should be a bit more extensible.

Thank you Jordan for creating this amazing marriage of dashboards and reporting!

Even though this is a prototype build, it proves the point that we can tweak CDF to become the new report viewer.

At this point, Gretchen asked me whether this will have any impact on any existing integration with the reporting plugin. Changing server side URLs is always a bad thing and Gretchen voiced the concerns of all our OEMs and partners who built an existing solution for the reporting plugin.

The changes we propose are purely client side changes. There is no need nor any plan to change server side APIs or change URLs or returned formats or any server side behaviour. You are safe.

As far as I know, this new report viewer will be part of the upcoming Pentaho BI-Server 4.5.

A world of new possibilities

This new report viewer adds a bunch of new possibilities to our reporting system. We can easily extend it, we can add new components and new ways of parametrizing reports. Parameters can look sexy and can be visually rich.

With CDF’s flexibility and ability to style the dashboard in any way you want, we can produce more flexible layouts that match existing corporate style guidelines of our customers. With the ability to quickly integrate new components we solve more business cases.

On top of my head, I can imagine a Google-Maps widget to select locations, clickable charts to select customers or product lines. Our Drill-linking can be used in new ways. Why not use a report or analyzer view to present your selection?

With CDF and the power of JavaScript in our hands we can also easily show or hide parameter as needed, or even produce selective parameter input paths based on the user’s selection.

With better parametrization, our reporting system will look more sexy, which means more people will be willing to use it. More customers is always a good thing.

But to make this bird fly, we need all the help that we can get.

How can you help to get it right from the start?

First and foremost: Give us your requirements. For us from engineering it is hard to know what obstacles you hit in the fields or what your clients ask you to solve. So instead of producing a system that is based on our limited “Steel-wheels” world, I would see an open discussion that comes up with a set of true requirements that match what you see from your customers every day.

If you already use CDF to drive parametrized reports, tell us all about it? What is the problem you are solving here? What extra work did you have to do to make it work for your use case? If you wrote some parameter input that might be useful for other: Would you share it with us?

Help us to expand the parameter definition dialog in the report designer so that we can easily add additional attributes to the report parameters. This way we can prototype faster and you can use this to pass additional configuration settings to the CDF parameter viewer.

And if you cannot give anything, then at least test what we write with your data. The earlier you test, the better we will be able to react to the results. And please, please: Test the early builds as well. If the product is already in RC (Release-Candidate) state then it is very hard to make major architectural changes. Your tests of the early builds help us to know whether we move in the right direction and allow us to correct our course when we are not.

So lets start the discussion here and now

What would you need from a parameter viewer? What requirements did you meet that forced you to implement your own dashboard-parameter-viewer?

Creating your own Parameter UI for the Pentaho BI-Server

Our BI-Server ships with a default GWT parameter UI for the parameter defined on a report. If you had been around for a while, then you will remember the sigh of relieve when we freed everyone from the tyranny of XActions for running simple reports. Since then the parameter capabilities of the reporting module grew and grew with every release making these parameters easier to use than the XActions’s original design.

GWT is nice for a old and grumpy Java developer like me, as I do not have to worry about JavaScript (untyped, for heaven’s sake, untyped!). But the hardcore nerds like Pedro “JavaScript is my life” Alves do not like the monolithic garbage the GWT compiler spits out. To slow, to heavy, and foremost: Not really extensible unless you recompile the beast for each change. And worst of all: I agree to these complaints.

However, there is a silver lining on the horizon. Our architecture is open, so you are able to replace the GWT code with your own magic with no problems at all. And here is how you would do it:

Basics: The parameter UI architecture

The parameter UI works as a REST service. The server receives calls and sends responses based on the parameters given, without holding any server side state elsewhere.

(1) The browser loads the report.html page and initializes the GWT parameter application (GWT from now on).

(2) GWT calls the server’s report handler with “renderMode=PARAMETER” or “renderMode=XML”. If there are values for any of the parameters known, then these values are given on the URL using ordinary HTTP-GET or POST requests.

The URL that is called is something like this:

http://demo.pentaho.com/pentaho/content/reporting?renderMode=XML&solution=steel-wheels&path=%2Freports&name=Product+Sales.prpt

(3) The server responds with the parameter XML document. If the renderMode is XML, this document also contains the number of pages in the report. The server only returns page numbers if the parameter validate correctly and if the pagination does not cause any other errors.

(If you are logged into the demo server, call the URL from step 2 in your browser to see the XML document the server returns.)

(4) GWT creates a UI for all parameters based on the Parameter XML document. All information is given as attributes on the parameter. The parameter’s possible values and current value are given in that document as well. These sets of values can change if a other parameter changes.

(5) If all parameters validated correctly (according to the Parameter XML document), it now sends a request to retrieve the rendered report. Again, this is a ordinary HTTP-GET call with all parameters attached onto the URL.

http://demo.pentaho.com/pentaho/content/reporting?renderMode=REPORT&solution=steel-wheels&path=%2Freports&name=Product+Sales.prpt

(6) The Browser displays the report content in the IFrame below the GWT parameter UI.

(7) Paging through the report jumps back to step (6) and updates the report frame.

(8) Changing a parameter value jumps back to step (2) and updates the parameter information.

The mystical Parameter XML

The parameter XML document is a description of all known parameters that the reporting plugin understands. The same format is also used by the Analyzer component and you can even get parameter information in this format out of XActions.

<?xml version="1.0" encoding="UTF-8"?>
  <parameters accepted-page="-1" autoSubmitUI="true" is-prompt-needed="false" layout="vertical" page-count="1" paginate="true" subscribe="false">
    <parameter is-list="true" is-mandatory="false" is-multi-select="false" is-strict="true" name="PROD_LINE" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="user"/>
      <attribute name="parameter-layout" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="horizontal"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="togglebutton"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Line"/>
      <attribute name="mandatory" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <values>
        <value label="Classic Cars" null="false" selected="true" type="java.lang.String" value="Classic Cars"/>
        <value label="Motorcycles" null="false" selected="false" type="java.lang.String" value="Motorcycles"/>
        <value label="Ships" null="false" selected="false" type="java.lang.String" value="Ships"/>
        <value label="Planes" null="false" selected="false" type="java.lang.String" value="Planes"/>
      </values>
    </parameter>
    <parameter is-list="true" is-mandatory="false" is-multi-select="false" is-strict="true" name="PROD_CODE" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="user"/>
      <attribute name="parameter-visible-items" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="6"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="list"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Product"/>
      <attribute name="mandatory" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <values>
        <value label="1952 Alpine Renault 1300" null="false" selected="true" type="java.lang.String" value="S10_1949"/>
        <value label="1972 Alfa Romeo GTA" null="false" selected="false" type="java.lang.String" value="S10_4757"/>
        <value label="1962 LanciaA Delta 16V" null="false" selected="false" type="java.lang.String" value="S10_4962"/>
        <!-- loads of parameter values removed -->
        <value label="1961 Chevrolet Impala" null="false" selected="false" type="java.lang.String" value="S24_4620"/>
        <value label="1982 Camaro Z28" null="false" selected="false" type="java.lang.String" value="S700_2824"/>
      </values>
    </parameter>
    <parameter is-list="true" is-mandatory="true" is-multi-select="false" is-strict="true" name="output-target" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="system"/>
      <attribute name="preferred" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <attribute name="parameter-group" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="parameters"/>
      <attribute name="parameter-group-label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Report Parameters"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Output Type"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="dropdown"/>
      <attribute name="hidden" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="false"/>
      <values>
       <value label="HTML (Paginated)" null="false" selected="true" type="java.lang.String" value="table/html;page-mode=page"/>
       <value label="HTML (Single Page)" null="false" selected="false" type="java.lang.String" value="table/html;page-mode=stream"/>
       <value label="PDF" null="false" selected="false" type="java.lang.String" value="pageable/pdf"/>
       <value label="Excel" null="false" selected="false" type="java.lang.String" value="table/excel;page-mode=flow"/>
       <value label="Excel 2007" null="false" selected="false" type="java.lang.String" value="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;page-mode=flow"/> 
       <value label="Comma Separated Value" null="false" selected="false" type="java.lang.String" value="table/csv;page-mode=stream"/> 
       <value label="Rich-Text-Format" null="false" selected="false" type="java.lang.String" value="table/rtf;page-mode=flow"/>
       <value label="Text" null="false" selected="false" type="java.lang.String" value="pageable/text"/>
     </values>
   </parameter>
   <!-- many many more parameter -->
 </parameters>

To create a usable UI, you need to look at some critical information in the parameter UI.

The parameter element itself contain the most critical information for a parameter.

<parameter is-list="true" is-mandatory="true" is-multi-select="false" is-strict="true" name="output-target" type="java.lang.String">

The parameter name gives the parameters internal name. Usually each parameter also has a “label” defined as attribute. The “label” is shown to the user, the “name” is sent to the server. The parameter element also indicates whether a parameter is a list or plain parameter. List parameter have a predefined list of values from which the user can choose from. Some list parameter allow the user to select multiple values (“is-multi-select”) or control whether the user can specify additional values (“is-strict”) or whether the parameter can be empty (“is-mandatory”). You can use this to perform client-side validity checks if you want to. But no matter what you do, the server checks the input anyway, as client side input is untrusted input. Not all parameter that are listed in the parameter document are shown to the user. Some are system level parameter that are only useful in certain situations (like all subscription parameter). A parameter is hidden if the hidden attribute is set to true.

 <attribute name="hidden" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="false"/>

How a parameter is presented to the user is defined by the “parameter-render-type”. Right now, this attribute can have the following values:

dropdown
list
radio
checkbox
togglebutton
textbox
datepicker
multi-line

 <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="dropdown"/>

Data formats for sending and receiving data

Sending parameter information over the internet can be a funny exercise. HTTP only allows to send text, so objects like dates and numbers need to be encoded properly. The Pentaho Reporting plugin expects all parameters in a locale-independent standard format. Numbers must be encoded as decimal numbers in plain english format (point as decimal separator, no thousands separators) using the format string “#0.#########”. Dates and Times must be given in the standard ISO format using the format string “yyyy-MM-dd’T’HH:mm:ss.SSS” or (if encoded with timezone information “yyyy-MM-dd’T’HH:mm:ss.SSSZZZZ”). (Also see: “Its about time – better Date parameter handling in Pentaho Report Designer“) Multi-selection parameter must be given by repeating the parameter name for each value. The order of multi-selection values is taken into account when processing the parameters on the server side. Example:

../report?multi-select=Value1&multi-select=Value2

Now all we need is a bit of JavaScript magic on the client side to replace the grumpy old GWT parameter application with a lightweight and smooth pure JavaScript/AJAX solution. Stay tuned ..

Additional Reading Material

The full specification for the parameter-xml file in the Pentaho Reporting Wiki.
Wiki Documentation for the Pentaho Reporting Plugin for the Pentaho BA Server

Integrating PHP/ASP/Ruby-Web-Applications with Pentaho Reporting

From time to time we get the same question on the Pentaho Reporting Forum:
“How do I integrate the Pentaho Bi-Server and reporting capabilities with my own (PHP/ASP/JSP/Ruby/whatever-) web-application?” Thrilled by the BI-Server’s blazing features, many users want to enrich their applications with rich reporting and analysis capabilities, and unlike the dinosaurs of the BI-Market, our Open-Source solution makes this easy and cost effective.

The techniques I am going to describe in this blog-post are no magic and can be applied to other web-application (like our Data-Integration server) as well.

First, you need to get a Pentaho system up and running. For now, and for the sake of easiness, lets stick with the standard Pre-Configured Installation (PCI). The Pentaho PCI is a Apache Tomcat server that runs the Pentaho BI-Server as its only web-application.

Installing the Pentaho BI-Server

Download Pentaho
Unpack the ZIP or tar.gz archive into a directory. On Linux or Unix systems, I would recommend something like “/opt” or “/usr/local/” as target. On Windows .. well, in that case you are just experimenting and not planing for production, I assume. So any folder will do.
Setup a publish password
You need that later for publishing reports or other files to the server. This is the major obstacle for most new users when trying to get their own content up and running.
Edit the file “bi-server-ce/tomcat/webapps/pentaho/WEB-INF/web.xml”
Change the parameter “fully-qualified-server-url” to the public URL of your server.
```
  fully-qualified-server-url
  http://localhost:8080/pentaho/
```
Optional: If some other tomcat installation runs on your system, your port 8080 may be in use already.
You can change the port in the file “bi-server-ce/tomcat/conf/server.xml”:
Start the Hypersonic database
Start the Pentaho BI-server
Optional: Install the BI-Server as a system service.

You should now be able to direct your web-browser to the BI-Server and should see the login screen when you type http://localhost:8080/ into your browser’s address bar.

Integration by Direct Linking

Integrating a BI-Server into a existing PHP application is easy now. All you need is a running BI-Server instance that is accessible by your users. From now on, I will simply use the Pentaho Demo Server for our little integration project.

http://demo.pentaho.com

The Pentaho Demo Server is a slightly tuned down installation of the Pentaho BI-Server. Anything I am going to show will work in the same way with the standard community edition.

When refering to the two applications, I will use “application” for the PHP, ASP or Ruby application that wants to utilize the Pentaho Platform. I will use the term “Pentaho” for the Pentaho BI-Server.

The most primitive way of connection two web applications together is to make the application include Pentaho in an frame or by opening up the specific report or analysis view in a new window.

My Web Application

This file could have been generated by any server side program. Use your phantasy here.

Reports can be parametrized by adding the parameters for the report to the URL. The parameters must be properly URL encoded. Use UTF-8 as character encoding.

This shows how to open up a report in a new Window: A report

Inside the Pentaho Platform, each report has its own page that can be accessed independently of the Pentaho User Console. The sample above links to a report on the system, using the predefined username and password. Note that the username and password is given in cleartext. To make this more secure you will need to set up a Single-Sign-On solution.

Pentaho Enterprise Edition comes with support for setting up Single-Sign-On for CAS rather easily via some setup scripts that take care of the ugly work. If you are using community edition and are willing to spend the time, you can always set it up yourself.

Alternatively you can set up Pentaho to work with Active Directory or LDAP instead.

Here are some sites with documentation to get you started.

And don’t forget: The Pentaho Wiki also contains a load of information.

Internally, Pentaho uses Spring Security, so with enough time and patience you should be able to connect Pentaho to anything on this planet for authentication purposes.

Access session and environment variables in a PRPT

Reports need to interact with the outside world in many ways. To formalize this, we allow to pass values from the outside to the inside. Pentaho Reporting knows several classes of inputs:

Environment variables
Parameter: User supplied values usually coming from an interaction of the user with a user interface
DataRow-Fields: part of the result-sets from the data-sources
Expressions and Functions: Calculated values

Today, we will talk about environment variables. Environment variables are system level parameter. You can use them to configure reports based on where you deploy the report.

We provide several properties for use within the Pentaho BI-Server.

serverBaseURL: the server URL, for example: “http://localhost:8080”
pentahoBaseURL: the location of the Pentaho Web-application: “http://localhost:8080/pentaho”
solutionRoot: The file-system location where the local copy of the solution repository can be found.
requestContextPath: the servlet’s request context path pointing to the local web-application: “/pentaho/”
username: The current user’s name
roles: The roles as CSV-encoded list. The CSV string is quoted according to the Excel CSV quoting rules.

Inside the BI-Server, environment variables can also be used to interact with the server’s environment to query session and global parameter information. The report environment allows you to access values stored on the current user’s session or as global parameter by using a special prefix.

Requesting a environment property “session:name” will query the session context for an entry called “name” and returns the string representation of this object to the report.

Likewise, prefixing a environment property with “global:” will query the global parameter collection of the BI-Server. Therefore querying a environment variable “global:name” will look for a entry “name” in the Bi-Server’s global parameters.

Accessing Environment variables with Formulas

The Pentaho Reporting Engine offers the formula function “ENV” to access the report environment properties.

Definition:

=ENV(String)

To access the variable called “example” as plain text, use

=ENV("example")

To access a value bound to the server-side session variable “example” use

=ENV("session:example")

If the variable is a CSV encoded string, you can use the CSVARRAY function to parse the string into a array of strings. This array is then suitable to use for multi-selection parameters or SQL-IN clauses.

=CSVARRAY(ENV("roles"))

You can now use these variables as part of other formulas, for instance to compute URLs pointing to documents on the server server.

=ENV("serverBaseURL") & "/myotherwebapp/downloads/pentaho-report-designer.zip"

Accessing Environment variables as predefined fields

When you open up the Pentaho Report Designer, you will see several of the well-known environment variables listed as fields on the report itself. Using the ENV function is flexible, but not very convenient. So we added a automatic mapping to the reporting engine.

The various env:* fields you see in a report are what I call “well-known” fields. As Pentaho Engineers we know knows that they exist on the server and are heavily used by our users. So we added a mapping to the system so that they automatically become fields without having to use a formula function to read their values.

The list of “well-known” fields is defined in the global report configuration. The global configuration can be edited by creating or editing a file called “classic-engine.properties” inside your Pentaho Report Designer and Pentaho BI-Server directories.

Inside the Pentaho Report Designer, this file resides in the “resources” directory in your Pentaho Report Designer installation directory. For the BI-Server you can find this file in the “pentaho/WEB-INF/classes” directory. To make your reports work consistently inside the Report Designer and the Server, make sure that both files have matching definitions.

For example: If you want to map your environment-property “value” into the field “env::value” then you would add the following line to your “classic-engine.properties” file:

org.pentaho.reporting.engine.classic.core.env-mapping.value=env\:\:value

In case value contains a comma separated list (using the Excel-CSV quoting rules for strings that need to be quoted), then you can make that list available as array by appending -array to its name.

org.pentaho.reporting.engine.classic.core.env-mapping.value-array=env\:\:value-array

The engine will recognize the “-array” suffix and will look for a “value” property instead, parses that into a string-array and returns you the array. The array then can be used in a SQL-IN clause.

The prefix “env::” is a convention inside the reporting engine to separate environment fields from other fields. If you omit this, you have to make sure that none of the fields returned by your data sources and none of your expressions or functions provide a value under the same name. This would overshadow your environment property and renders it inaccessible.

Using Queries in Formulas in Pentaho Report Designer

One of the most powerful features of Pentaho Reporting is our ability to combine data from a multitude of datasources into a single report.

A report can be driven by a a query that reads an XML file. The results then parametrize a Web-Service. The web-service response then gets fed into a standard SQL query. And that finally drives the queries to a OLAP cube.

Sure, that example is a bit extreme, but possible out-of-the-box with Pentaho Reporting downloaded from SourceForge.

But we can do more: In Pentaho Reporting 3.8, you can also use two formula functions to query report datasources to incorporate the query-result into your report.

The SINGLEVALUEQUERY function retrieves a single values from the resultset. It always reads it from the first row of the result-set, either from the given column or simply taking the first column it finds.

SINGLEVALUEQUERY()
SINGLEVALUEQUERY([query:string]; [column:string])
SINGLEVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])

As you can see, both column and query-timeout are optional parameter. The “query” given must be a valid report query – but more on that later.

The MULTIVALUEQUERY function works similar to the SINGLEVALUEQUERY formula function. But instead of retrieving just one value, it returns all values as an list.

In addition to the parameters of the SINGLEVALUEQUERY formula function, it also accepts a “query-limit” Integer-parameter, that restricts the number of values returned.

MULTIVALUEQUERY([query:string])
MULTIVALUEQUERY([query:string]; [column:string])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer]; [limit:integer] )

So what is that ominous “query” parameter about?

In Pentaho Reporting, most data sources use logical names (or handles) to reference queries. Queries can be quite complex. SQL and MDX queries can be huge, and seeing queries printed out that cover several pages is rather normal. They are complex beasts that were difficult to write and even more difficult to keep free of errors. You don’t want to seem them being spread over throughout the reports.

All except for the “Advanced/Custom” data-sources hide these monsters behind easy to remember names. This allows us to encapsulate the complexities of the actual queries while at the same time documenting the purpose of the query itself. It is just easier to use “Sales-Persons-By-Region” in a report than to repeat the SQL statement everywhere.

Each report holds the logical name in its query attribute. When the query needs to be executed the reporting engine checks all declared data-sources to find one that recognizes the handle and that then returns a result-set for the assigned query.

For the formula system, I followed the same approach. It is easier to maintain formulas that use

SINGLEVALUEQUERY("sales-top-performer")

than to use

SINGLEVALUEQUERY("SELECT name, Sum(Sales) AS 'Sales' FROM SalesData ORDER BY 'Sales' LIMIT 1")

especially if that formula is going to be used several times in the report.

As a side note: As with calculated queries in the report itself, you can add a Advanced/Custom data source to compute a query at runtime. At that point, the second query would be valid.

These are the steps needed to use a SINGLEVALUEQUERY or MULTIVALUEQUERY formula function in a report for a given query.

(1) Define the data-source
(2) Add a query, test that it gets your expected result back.
(3) Reference the query in the formula

How do I parametrize the SINGLEVALUEQUERY or MULTIVALUEQUERY formula function?

For moth formula functions, parametrization happens implicitly. At runtime the reporting engine passes the current data-row, including all report parameter, the values from the report’s current result-set and the calculated function/expression values to the data-factory that executes the query.

Queries need to get parametrized as usual. For instance for SQL use the ${param} syntax. The underlying data-source itself specifies the names of the data-row columns it wants to use as parameters.

Example:

Lets assume that your report contains a field called “CUSTOMERNUMBER” and you want to retrieve the accumulated sales for the user via a SINGLEVALUEQUERY function.

(1) Define the SQL datasource on your report (or reuse one you already have defined).
(2) Add the query, name it “Sales-For-Customer” and provide the following SQL:

SELECT
     SUM("ORDERDETAILS"."QUANTITYORDERED" * 
     "ORDERDETAILS"."PRICEEACH") AS "Sales"
FROM
     "ORDERS" INNER JOIN "ORDERDETAILS" ON "ORDERS"."ORDERNUMBER" = "ORDERDETAILS"."ORDERNUMBER"
     INNER JOIN "CUSTOMERS" ON "ORDERS"."CUSTOMERNUMBER" = "CUSTOMERS"."CUSTOMERNUMBER"
WHERE 
     "CUSTOMERS"."CUSTOMERNUMBER" = ${CUSTOMERNUMBER}
GROUP BY 
     "CUSTOMERS"."CUSTOMERNUMBER"

(3) Add a label, and edit the “value” attribute-expression. Set the following formula

=SINGLEVALUEQUERY("Sales-For-Customer")

Download the sample report

As you can see, there is not much magic in using the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, but the effect they can have is powerful.

PS: Multiple queries with the same parameter values will be cached by the Report Data Cache

Using Reporting Parameter for fun and profit

The new Pentaho Reporting 3.8 is the fourth release in a row where parameter played a important role. After ironing out all the easy problems, we are now in a state where we can think about creatively abusing the system.

There is one questions in the support centres that get repeated over and over again.

How do I limit the visible output types for a particular report?

Up until now, there was no sane answer. You either take it all, or you lock the report down to a single output type. But selecting just HTML and PDF, that was impossible.

The desired output for a report is controlled by the “output-target” parameter. The values for this parameter are defined somewhere deep inside the reporting engine. Every output target has its unique identifier, that tells both the type of the export (pageable or table) as well as the actual content type that is generated (text, html, pdf, and so on).

Output-target	Description
table/html;page-mode=stream	HTML as a single page, all report pagebreaks are ignored
table/html;page-mode=page	HTML as a sequence of physical pages, manual and automatic pagebreaks are active
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;page-mode=flow	Excel 2007 XLSX Workbook
table/excel;page-mode=flow	Excel 97 Workbook
table/csv;page-mode=stream	CSV output
table/rtf;page-mode=flow	Rich-Text-Format
pageable/pdf	PDF output
pageable/text	Plain text
pageable/xml	Pageable layouted XML
table/xml	Table-XML output
pageable/X-AWT-Graphics;image-type=png	A single report page as PNG
mime-message/text/html	Mime-Email with HTML as body text and all style and images as inline attachments

Since Pentaho Reporting 3.8, we check whether the current report defines one of the known system parameter. If you define your own representation of such a parameter, we use your definition instead of the built-in ones.

Using that knowledge, it is easy to create a list parameter that defines a subset of the available output types. Drop your selection into a table-datasource, and define the parameter as one of the single-selection list-parameter types.

Server Side Printing

You can also invoke server side printing directly from the parameter UI without having to go through an XAction. Define a boolean parameter by creating a table with the following entries:

ID	Value
false	Do Not Print
true	Print

Make sure you set the default value of your parameter to “false” or make false the first selection in your list, or you are likely to trigger printing involuntarily. Save the trees and so on.

If you want to print to a specific printer, you can do so by defining the “printer-name” parameter as well. And with the magic of the scriptable datasource, you can populate it with all known printers:

import javax.print.DocFlavor;
import javax.print.PrintService;
import javax.print.PrintServiceLookup;
import javax.print.attribute.standard.PrinterName;

import org.pentaho.reporting.engine.classic.core.modules.misc.tablemodel.TableModelInfo;
import org.pentaho.reporting.engine.classic.core.util.TypedTableModel; 


    PrintService[] services = PrintServiceLookup.lookupPrintServices(
        DocFlavor.SERVICE_FORMATTED.PAGEABLE, null);
    TypedTableModel tt = new TypedTableModel();
    tt.addColumn("ID", String.class);
    tt.addColumn("Value", String.class);
    for (int i = 0; i < services.length; i++)
    {
      PrintService service = services[i];
      PrinterName displayName = service.getAttribute(PrinterName.class);
      if (displayName != null)
      {
        tt.addRow(new Object[]{service.getName(), displayName.getValue()});
      }
      else
      {
        tt.addRow(new Object[]{service.getName(), service.getName()});
      }
    }
    return tt;

Supporting the old population – BI-Server 3.5.2 and 3.6.0

Now that the end of the traditional calendar year is near and Samhain is fast approaching, it is time to take care of the dead corpses of the old and existing production installations out there.

Personally, I am a big fan of instant upgrades. As soon as a new release is out, upgrade. Agreed, I am biased as every old installation means I have extra trouble with support and bug reports. Every old installation gone means less work for me. Isn’t that cool?

But there are those boring people with arguments like “cant upgrade a tested installation that is in production” and “never change a system that already works”. As those people also fall into the category of being “honourable Pentaho customers”, I quell the urge to scream “Get over it, what’s live without a bit of risk and fun”.

Pentaho Reporting 3.7.0 release ante portas

So here is my personal take on that problem. Note that my preference does not affect your support plans or dictates what Pentaho as company can and will do or can and will support. It just affects what I do in regard to feature and support requests that come to me via the open channels like the forum, e-mail or personally addressed blackmail.

This release will be out somewhere this year. Note how carefully I avoid any specific dates. It will be released when it is finished or when our Sales and Marketing team grow tired of waiting and invade the developer dungeon to get hold of the treasure.

Pentaho Reporting 3.7.0 will resurrect the dead BI-Server corpses

After that release I will branch the 3.7 codeline to retrofit it into the BI-Server 3.5.2 and 3.6.0 codebase. At the end of that process we will be able to upgrade the existing BI-Server 3.5.2 and 3.6 installations with the new reporting engine.

The server will still be mainly 3.5.2 code, as I do not plan to touch any of the BI-Server’s core projects. The “reporting-plugin” and the “pentaho-reporting-classic-engine” will be the only things that change.

The automatic parameter discovering in PRD for Drill-Linking will only be supported for links to reports. XActions and Analyzer do not support the correct parameter format and I dont feel tempted to mess with their sources (and potentially break things on the way). However, when you know the parameter names, you can always add the parameter to the “Manual Parameter” section of your Drill-down definition.

All other reporting related features will work as if you had a BI-Server 3.7.0 release.

Let the dead corpses rest in peace: No support from me for 0.8.9 and 0.8.10

I will make no effort whatsoever to support BI-Server 3.5.0 or anything that uses the old 0.8.something reporting engines. That code is so old that even my local necromancer would have trouble getting that back to life. I ended all support for that some time ago and although I happily answer questions from time to time, I will not open up that code in my shiny and clean IDE. If you are still working with that code, consider to upgrade, or be prepared to work on the branch alone.

BI-Server parameter input: An authoritative voice

Over the last few release cycles, we have spent some tremendous
time in making the parameter input easy to use. Along with the
community and user feedback the GWT client received new capabilities
on when and how to pass parameters around.

We strive for a release it quick and release it often approach,
as we rather solve pain in a suboptimal way today than to wait for
a perfect solution that comes next year, if ever. And thus it is a
sad (and rather natural) truth that not all attempts are hitting
the spot.

Especially the auto-submit feature is a creepy little bugger and
has seen numerous hacks, tweaks and finally a full make-over in
the upcoming 3.7 release.

Architecture: the inner workings of the reporting plugin in 3.7

The reporting plugin is a GWT application, and thus when first
invoked, it will initialize itself, inspect all parameters it
received from the URL and fires a non-paginating parameter-request
back to the server.

On the server, the Parameter-content-handler handles the request.
It read all values from the URL. URLs only transmit strings, so
in a second step, those raw values get parsed into Java-Objects.
The parsed values now get passed down into the reporting engine
for further validation. The engine validates the parameters and
either accepts the values as fully valid or spits out a bunch of
error messages.

If all values are valid and the request asked for pagination, we
also paginate the report to return the total number of pages for
the report.

The validation and pagination result, along with a description of
all available parameters and their accepted values are sent back
to the GWT UI.

The first parameter-response now initializes the GWT application
and sets the initial state of the auto-submit feature. If the
GWT UI detects that all parameters have been accepted as
valid, and if auto-submit is enabled, it now fires a full parameter
request that paginates the report.

Subsequent updates will then follow a slightly different protocol:

If auto-submit is disabled, any user-input triggers a lightweight
parameter validation request. This ensures that list-parameters
always get selections based on the actual user input. The response
only updates the parameter definitions and values and at no point
the report gets executed.

When auto-submit is enabled, we always fire heavyweight parameter
requests, which trigger a pagination run on the server. When such
a request returns without error, we do show the report content.
This finally invokes the report-content-generator and creates
the actual report content that is shown below the parameter input
pane.

The BI-Server 3.6.0 Situation

In this version all parameter validation requests were
heavy weight requests. So every parameter validation triggered
a pagination.

A pagination run is only triggered if all parameter were validated
successfully. To turn the heavy weight parameter requests into a
3.7-style lightweight request, you need to add a parameter that
always validates to false, until the user explicitly selects it.
One simple way of doing that is to add a mandatory check-box
parameter with the values “Wait” and “Run” and the post-processing
formula

=IF([p] = "WAIT"; NA(); [p])

that translates all wait into values and thus making the
parameter unvalidatable until the user selected “run”.

Fixed rules: More on Parameter processing

In the famous “lets do it right this time” release of Pentaho Reporting 3.5,
we introduced the ability to have parameter on a report. Well, it wasn’t quite
right, with parameters you need to pre and post process the data to make it
sound. That was release 3.6. At that point, so our theory, you, dear user,
should be happy. But apparently, the beast we created wasn’t all pretty.

Well, its PRD-3.7 now and guess what we are improving: Parameters.

So far, the date parameter processing was not quite right. I still wonder
why after 5 years of XAction no one complained about that. Sure, XActions
only have strings, and any processing or parsing is up to you – and so is the
blame if it does not work.

The various system level options on the parameter UIs was .. sub-optimal.
(I’m getting better at phrasing it more positively, don’t you think?)
The list of supported parameter (aka “the list of parameters you shall not
use in your report”) grew with every release. The Swing UI and the server UI
never quite agreed on what setting to accept and how to behave in border
cases. Thus creating formulas that worked in both settings was a chore.

And last but not least: Validating parameters and getting them run in a
consistent way was difficult. Give a Integer where a Long was expected
and you are screwed. Without a error message. Thus even for me working with
the parameters was more a easter-egg search than sane designing.

And last but not least: Even the parameter processing order was a bit
funny. It works for simple cases, but behaves rather funny for the
not so simple ones.

How Pentaho Reporting Processes Parameters

Each parameter in Pentaho Reporting carries at least two formulas that
eventually need to be evaluated.

The default-value-formula is used to produce a valid value if the user
provided no value.

The post-processing-formula is used to transform the user’s input into
something more usable or simply to validate arbitrary business rules
(a deposit cannot be negative, for instance).

And last but not least, if you reference an other parameter, you expect
it to contain the proper post-processed value.

In PRD-3.6, the order of the validation was largely out of sync with
those expectations. In fact, post processing was done in blocks, so that
parameter were not able to use other post-processed parameter values in
their queries. Now that’s bad, and I guess Gretchen will be able to share
a few unhappy tales in Lisbon about that.

In PRD-3.7, each parameter is now fully processed on its own before
the next defined parameter in the chain is processed.

Lets be more formal for a while:

For each parameter defined:

If the value for the parameter is <null>, we compute the parameter’s
default value and use that as untrusted parameter value. The default-value
formula only sees the previously validated parameters.

In a second step, we post process the parameter to get a trusted value.
The post processing formula sees the previously validated parameters
and the untrusted value. So be careful how you use the untrusted value
here, as you cannot trust users and SQL-injections or cross-site-scripting
troubles are never to far away.

If the post-processing formula fails with an error the
trusted value of the parameter will be <null>, a warning message will be
issued and last but not least we refuse to execute the report. The
parameter processing continues with this value set to <null>.

And finally we check the type of the parameter and compare the parameter
value against the list of valid key-values. If the value passes this test
it becomes a trusted value and will be used in the further parameter
processing and ultimately it will be used in the report.

If the parameter fails the test, we report an error, prevent any report
processing and continue to validate the remaining parameters using the
parameter’s default value.

Beginning with this version, the parameter validation also creates the
set of validated values after the validation is complete. For a report without
any parameter values set, this will yield the default values for all parameters.

So what does this mean for you?

The new schema brings a couple of changes to the way the system behaves.
Default values are now context sensitive and can change when the
selection for the previously declared parameters changes. Our parameter
UIs do not directly use that feature for usability reasons.Automatically changing
the user’s input is not very nice and confuses and/or upsets people. A lot.

The post processing formulas are now executed in a timely manor and before
the default-value or selection for a parameter is computed. This way, you
are now able to compute the mondrian-role array in a hidden parameter’s
post-processing formula and be sure that your datasource sees it.

And last but not least, your formulas wont be able to use values that have
not been validated, nor would the report ever include them. Especially with
the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, this is mandatory.
Your database is yours and we all want to keep it that way.