Pentaho Reporting’s Metadata DataSources now with more scripting power

Pentaho Reporting is rather flexible. No, astonishingly flexible. Calculations for styles, attributes and even queries make sure that there is always a way to tweak a report exactly the way you need it. But until now, there were a few things that were not exactly easy. Custom Data-sources and their method of calculated queries were one of them.

With today’s code drop, this story has fundamentally changed. The Pentaho Metadata data-source is the first data-source that combines static queries with optional scripting features. There is no need to have a “Custom-Metadata” datasource anymore. No need to cram your query calculation logic into a tiny field on the report’s query-attribute.

Here it is: The new and greatly improved Pentaho Metadata Data-Source:

The scripting extensions hide beneath the two new tabs on the top and on the query-text area. Those who don’t need scripting or who are scared of having to program can safely continue with their life – just ignore the scripting-tabs and you will be safe in your static world. But if you feel you are born to greatness and therefore you cannot be limited to predefined queries, then follow me on a exciting journey into the land of power-scripting.

The scripting extensions are split into a global script and a per-query script. The global script can be used to define shared functions or global variables that can be seen by all query-scripts. The init()function can also be used to change the configuration of the data-source itself. The per-query-scripts allow you to (a) customize the query string, (b) calculate the “additional fields”  information for the query-caching and (c) allows you to post-process the returned table-model.

The original “Custom-*-Datasources” always had the problem that you can only add one of these datasources per report or subreport. Now that we combine scripts with named queries, we can have a independent script for each of the queries and of course as many datasources as necessary.

The scripting backend uses the JSR-223 (javax.script) scripting system. By default we ship with both JavaScript and Groovy support. There are quite a few JSR-223 enabled languages out there, so if you have a favourite language, just drop it into the classpath on both the Pentaho Platform server and the Pentaho Report Designer and you are ready to code.

The picture above shows you the global script pane. For the two languages (JavaScript and Groovy) we ship with, we have a prepared template script that contains some documentation as well as the empty function declarations of the functions we expect to be able to call.

It is absolutely OK to delete any function you dont need – in that case Pentaho Reporting simply ignores that part. This will help you to keep your scripts small and simple. You can load scripts from external sources as well, but be aware that it is your responsibility to make these external scripts available to the report at runtime. So if you load a script from a local directory, you have to ensure that the same works on the BI-Server (or your production environment) too.

For the upcoming Pentaho Reporting 3.9.0 release, the Pentaho Metadata datasource is the only datasource with this sort of scripting. Once that design has proven itself and I have more feedback on how well it works, we will add this scripting system to all major data-sources in Pentaho Reporting 4.0.

PRD-3639: On Adding Version checks into Pentaho Reporting

From time to time I get support requests from users who try to create reports in the latest Pentaho Report Designer and then try to run it in a ancient installation of the Pentaho BI-Server. I already wrote about the limitations Einstein placed on us software developers, so lets all agree – it’s not possible and there will be no fix.

Along with a Voodoo-security change that obscures all passwords stored in a PRPT file, I now prevent the horse from ever drinking petrol again. I added a strict check while a report is parsed so that newer versions of a PRPT file cannot be opened in older servers. Whenever you try such a thing, you will now get a very clear error message telling you:

The report file you are trying to load was created with Pentaho Reporting 12.0 but you are trying to run it with Pentaho Reporting 3.9.0. Please update your reporting installation to match the report designer that was used to create this file.

and if you just missed a patch release, you will be able to continue, but get a warning urging you to upgrade.

The report file you are trying to load was created with Pentaho Reporting 3.9.5 but you are trying to run it with Pentaho Reporting 3.9.0. Your reporting engine version may not have all features or bug-fixes required to display this report properly.

Older releases that did not store version information in a usable format inside the PRPT files will not be checked. This check only applies to released versions. All development versions do not store any version information and do not validate version information they may find in the files. If you are using development versions I assume you know what you are doing.

For the upcoming 3.9.0 release, you won’t see any impact. The first time this check will be hit will be Pentaho Reporting 4.0 later next year and from that moment on I will never have to talk about poisoning my poor old horse with petrol again (it prefers bio-diesel anyway).

A Message from the Trenches ..

Over the last 6 weeks I finally found the time to dive into the crosstab related development. Crosstabbing as a data manipulation exercise is a rather easy and straight forward as an algorithm. Printing simple crosstabs without regard for user defined calculations is not hard either – if you are willing to stick to the simple model for eternity. But integrating the crosstabbing code so that the layouting uses our existing capabilities of style- and attribute-expressions, flexible layouts and decent scalability even when processing massive amounts of data – that takes more than a two-weeks prototype hacking.

Step zero on my quest happened earlier this year when I wrote a testing framework to validate the layouter output on a very low level. The idea of this testing framework is based on “golden samples” – known good data-dumps of the layout results. For every change I make, I now can validate if and how the final layout is affected. Sometimes I want to see changes (especially when fixing bugs), but for most parts I want to add new functions without breaking existing reports. The testing framework helps me to detect changes in the layout that I did not foresee.

The framework shields me from the system components, like the local font renderer (a real bugger that changes with every update of the operating system) and from changes in the local font files (Arial on Mac is not the same as Arial on Windows, for instance).

Step one of implementing crosstabs then involved getting a true table layout into the reporting engine. Our reporting engine does is a banded engine in the tradition of the old COBOL and RPG/400 reporting tools. Each band is fairly separate from other bands. On paper you get the illusion of dealing with tables, as normally row after row of data is printed at the same horizontal position. But our primitive model cannot adjust for changes in the column sizes, as once a band is printed it cannot be altered any more.

This is similar to trying to create a table in a word processor by only using tabs. Once you try to put a overly long item into your ‘cell’, the layout goes all wrong, as previous and following rows do not alter their column sizes to match the large item.

But if we use a proper table, with real columns and rows, the problem goes away almost immediately. A table allows us to place elements relative to other elements in other rows and to maintain that relationship as long as we want. For small tables, this may be for the whole data-set, but similar to the “table-layout: fixed” CSS style attribute, we can also define a cut-off point after a certain number of rows and thus balancing the need for keeping the table flexible and the need of not buffering too much layout data.

During a long interlude during the summer time I was busy working on two rather large bug-fix releases (3.8.1 and 3.8.3) with no time whatsoever to do any new development. (I just managed to sneak in a week or two of table-layout related work.). And now, finally, since Mid-September I am back into the layout system. By eliminating all distractions – including writing articles here – I managed to get some private cuddle-up time with the rendering system.

The reporting engine comes from a banded background, and thus our existing layout system was built around the assumption that the world is easily consumable in banded chunks. With the Citrus-rewrite two years ago I opened the layouter to a more flexible world view by recreating the layout system as a CSS/DOM oriented layout system. But without a need for cross-band layouts, I still ended up introducing many assumptions that only work in a banded world.

In the old (3.x) layout system, global structures like groups, subreports and root-level bands are produced by the “Renderer” class. The renderer is the central point of the layout calculations and manages both the creation of the layout nodes and the calculation of the final layout. The contents of the bands themselves are then computed by a class named “DefaultLayoutProducer”. This model fails horribly if the banded layout is just a subset of a larger structure, like a table, for instance. The “DefaultLayoutProducer” is not aware of the outside model, and the “Renderer” does not care what the “DefaultLayoutProducer” does within his own band. I created the model of a completely dysfunctional family here.

With the new system in place, there is only one point that produces layout nodes. The model is no longer a collection of local sub-models but one big model with a global state. That not only simplifies the code, it also opens up a new set of capabilities.

So far, the layout system rewrite is nearly complete. The “golden sample” tests of the engine-core project are running fine on my box now, but some of the integration tests in the “testcases” project still fail. Once they work, I can rewire the layouter to accept global table definitions across groups. I can also finally open up the group layouts to support more layout options, and thus allow to print groups horizontally instead of vertically, or even print the header, group-body and footer side-by-side.

For the next few weeks, I will be back to bug-fixing for the upcoming 3.9.0 release. This release will contain mostly bug-fixes and will be shipped with the BI-Server 4.5.0.

Master your page-breaks in the Pentaho Report Designer

It is funny how at times the same question or problem flares up from many disconnected developers leading them to ask the same question in just a few days. The most recent incident of this sort of questions could be paraphrased as:

How the $%&§ do these pagebreaks work

Pentaho Reporting allows you to control Pagebreaks via two style settings: “pagebreak-before” and “pagebreak-after” indicating whether you want to start a new page before or after the band or section prints. Pagebreaks are usually set on any of the root level bands. Root level bands are all main sections you see in the report, like the “group-header”, “group-footer”, “report-header” or “report-footer” or the “details” section. Pagebreaks on these bands are always honoured, but you can add breaks to other locations as well.
Whether pagebreaks are allowed on a certain element depends on both the element type and the position of that element in the report layout.

Pagebreak rulez

Pagebreaks are allowed on all root level bands except the page header and -footer. The pagebreak request will be ignored for repeated instances of the group- or details-header and footer that are printed in the header and footer area of a page. A pagebreak inside a page-header or footer would not make any sense and could send the poor Pentaho reporting engine into a infinite loop of destruction when each new page triggers the generation of a new page.

Pagebreaks can be defined on all subreports. The pagebreak will only be honoured for inline-subreports, but will be ignored for banded-subreports. You can define the pagebreak on the banded-subreport’s report-header to achieve the same effect.

Pagebreaks can be placed on normal bands within the same root-level band. The pagebreak will only be honoured if the layout for this band follows some simple rules. All parents of the band must have a block-layout – more on that later.

Defined Pagebreaks will be ignored on a master-report or on groups, even though we will support breaks on the group-object in later versions. For now, if you want to start a new page for a group, put your pagebreak on the group’s group-header instead.

So in short:

  • Pagebreaks can be added to all root bands
  • Inline subreports can have pagebreaks
  • Normal bands can have pagebreaks if they are in a block-layout.
  • MasterReport, Group and Banded-Subreports cannot have pagebreaks.

Get pagebreaks on bands and inline-subreports to work

Pagebreaks work on all bands as long as the band sits in a valid pure block-level context.

A block layout is a layout strategy where each element inside the layout context spans the full width of the parent section. Elements are stacked below each other like paragraphs in a text book. To learn more about the layout strategies, I recommend you to read about “Element Layout Strategies in Pentaho Reporting“.

In Pentaho Reporting 3.5 and newer the report layouter places all root level bands into a block layout. Therefore for the outer-most master report, all root bands span the full width of the page and new bands are added directly below the previously printed bands.

A element’s layout context is a pure block-level context when all the parent bands of the element have a block-layout. In that case, the element or band itself is guaranteed to span the whole width of the page. This gives us a natural cut-off point for starting new pages.

So what does that mean for your layouts?

If you want to add a pagebreak to your bands or inline-subreports, the parent band (the band that contains your band or subreport) must have the style property layout set to block. And then this band’s parent must have a block-layout until you come to the root-level band itself, which must be set to layout: block as well.

Create multi-page details bands

You can use the block-layout technique to divide any root level band into multiple bands that have page-breaks between the sub-sections. And here is how you do it:

  1. Select your root-level band, for instance the details band.
  2. Switch to the style-tab on the right-hand side table, and locate the “layout” style.
  3. Set the layout to “block”
  4. Add two or more bands to the root band by dragging them into the root band editor.
  5. Add pagebreaks as needed to these sub-bands.

With those simple rules in mind, pagebreaks should no longer be the mysterious feature anymore.

A CDF based parameter viewer

At our community conference in Frascati yesterday I gave a talk on how to replace the old GWT report viewer with a slim CDF based report viewer.

Giving Granny a Face-Lifting

This is the slightly edited full-text version of this talk.

Are you tired of our trusted GWT report viewer

When we introduced Pentaho Reporting 3.5, one of the major new features we added was the ability to run Pentaho Reports directly in the BI-Server without the need for writing or generating XActions. This feature instantly removed the number one headache our users had with reports on the server – the need for an additional runtime file, the XAction. The file contained the same information they already specified in the report designer. But to edit the file later, they would need to go into a totally different editor to do some sort of magical programming. Ordinary business users could and would not do that.

We created the report viewer with Google’s Webtoolkit, which promised an easy way of creating rich JavaScript UIs without having to resort to homegrown libraries.

Where has all the love gone?

There are no simple solutions. GWT turned out to create monolithic code that lived on its own island. GWT applications were nearly impossible to extend for normal (non-GWT-using) web-developers. The code was hard to debug (as normal debugger like Firebug cannot help much to make sense out of the autogenerated code). And GWT was slow. Slow to compile and slow to run (compared to other JavaScript alternatives).

Our partners had no particular love for GWT, and bit by bit we grew tired of it as well. Over time we realized that GWT would not be the silver bullet.

Our report viewer implementation also suffered from a few deficiencies. There is no easy way to create alternative layouts for the parameter UI, the date parameter input is simple and limited and long parameter texts can cause problems.

In the mean time, our consultants and partners worked around these limitations by using  CDF to build custom parameter pages for reports.

And then replace the GWT Viewer
… with a normal CDF-Dashboard?

CDF instantly solved several of their problems. In CDF you are free to design your parameter page the way you like it. CDF is by far more flexible than the GWT viewer (as you have simple code with loads of extension points available). CDF is made to create interactive dashboards with a rich user experience.

And CDF comes with a PRPT component, so it already knows how to drive a report.

… but you know you pay the price

But once again: There is no magic silver bullet. Writing an extra CDF file suffers from the same problem that XActions have. Suddenly you have to duplicate the parameter information from the report designer into the dashboard. You have to replicate all parameter dependencies.

CDF requires some technical skill to create a dashboard. Similar to XActions, the report designer cannot read a CDF file for editing (and given that CDF is JavaScript that you can program in any way you want, there is no way we could ever hope to build such an editor). CDFs duplicate the information that is already in the report, and any change to the report parameters must also be applied to the dashboard.

Creating CDF files is not free – your IT department or an external consultant has to do it for the ordinary business user. So from a business point of view, that sort of “premium parameter viewer” would only be feasible for critical reports where you can justify the high development and maintenance costs.

To make CDF work for all reports, we need to solve the “duplication of parameter information” problem.

A simple solution:
Let PRPT’s information drive CDF

Mike D’Amour architected the GWT reporting plugin as a RESTful service. We do intentionally avoided all of the GWT server side libraries to communicate with the server. The report viewer uses standard HTTP-GET or PUT calls to query the server, which responds either with content or XML files.

The report viewer only uses the server’s public URLs to get information about the report’s current parameters. In fact, anyone can call these URLs to get the same information. We do not have any limits on what kind of client you use to interact with the reporting plugin.

Flow, Report Viewer, Flow

The GWT report viewer uses a very simple algorithm to communicate with the server.

  1. First we query the parameter XML by passing all known parameter to the server.
  2. We parse the XML and render the UI
  3. We check whether the server found any problems with the parameter we given. If everything is OK, we ask the server for the report content.
  4. We wait for input from the user.
  5. On any new input or if the user hits submit, we go back to the start and query the parameter-XML again.

You can find more information about this cycle in one of my previous postings.

Action time

Click here to see a simple form-based parameter page. This demonstrates how to communicate with a Pentaho BI-server and shows the basic steps to parametrize an existing report. If you see a login window in the lower frame, then login and restart the demo.

The form itself is simple:

<html>
<head>
  <title>Report Viewer

<body>
<div style="border: 1px solid black; margin-bottom: 20px">
  <h1>Parameter Input
  <hr />
  <form action="http://demo.pentaho.com/pentaho/content/reporting" method="GET" target="viewer">
    <div>
      Report to load:
      <input name="solution" value="steel-wheels"/>
      <input name="path" value="reports"/>
      <input name="name" value="Invoice Statements.prpt"/>
    </div> 
    <h2>System parameter</h2>
  <label for="renderMode">Render Mode</label> 
  <select id="renderMode" name="renderMode" size="1"> 
       <option value="REPORT">REPORT</option> 
       <option value="XML">XML</option> 
  </select>
  <label for="output-target">Output Target</label> 
  <select id="output-target" name="output-target" size="1"> 
     <option value="table/html;page-mode=stream">Single page HTML (table/html;page-mode=stream)</option> 
     <option value="table/html;page-mode=page">Paginated page HTML (table/html;page-mode=stream)</option>
   </select>
  <h2>User parameter</h2>
  <label for="Customer">Customer</label>
  <input type="text" id="Customer" name="CustomerNo" value="242"/> 
  <label for="ReportStamp">Report Stamp</label> 
  <input type="text" id="ReportStamp" name="Report Stamp" value="Review"/>
  <div/> 
  <input type="submit"value="Go!"/>
</form>
  <h1>Report</h1>
  <iframe name="viewer" width="100%" height="50%"/>
</div>
</body>
</html>

In the first section we setup a few system level parameter. The form contains the path to the report we want to render (expressed as Pentaho Standard Triple – solution, path, name), the renderMode (that defines whether we query parameter information (XML) or whether we render the report (REPORT)) and finally the output target that defines what output the server should generate.

This form is already a valid method to supply parameter to the reporting plugin and shows that there is no magic involved.

Now, lets do the same again … in CDF

Jordan Ganoff wrote a prototype of a CDF dashboard reads the parameter information from the Pentaho reporting plugin to construct a dashboard.

Due to some security restriction in the JavaScript execution in browsers (same origin rule) I cannot provide a one-click example. Download the zip file and copy the contents into your BI-Server’s solution directory.

You can then switch to the new parameter viewer by replacing the “reportviewer/report.html” part with “web/reportviewer.html”.

So the original URL for your GWT report viewer

http://localhost:8080/pentaho/content/reporting/reportviewer/report.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

becomes this URL

http://localhost:8080/pentaho/content/reporting/resources/web/reportviewer.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

(Download the CDF based report viewer)

Lets hand over the microphone to Jordan to explain the architecture of this implementation.

Here’s a quick introduction:

The new report viewer is a collection of CDF components. You can follow
the logic starting in reportviewer.html’s load() function. We set up a
div to inject the prompt panel into and then call:

pentaho.common.prompting.createPromptPanel({
          destinationId: "promptPanel", 
          paramDefn: reportViewerParameterLookup(),
          refreshParamDefnCallback: reportViewerParameterLookup,
          extraComponents: [{type: "SubmitReportComponent", htmlObject: 'report-div'}]
});
    • destinationId: the element Id where the prompt panel will be injected into.

 

  • paramDefn: this is the parameter definition (parsed Parameter XML into an object)

 

 

  • refreshParamDefnCallback: function called whenever a parameter has changed its value. For now we will hit the ParameterXmlContentGenerator for a new parameter xml and parse it to a parameter definition every time a parameter value has changed.

 

 

  • extraComponents: Any additional cdf components you’d like initialized. I have a quick prototype component defined in reportviewer.html called “SubmitReportComponent” that will listen for the parameter “submit” to change (which is fired by the submit button on the prompt panel). When this parameter changes the update() method of the SubmitReportComponent is called. We build a valid reporting url and set the iframe’s src to that url to load the report. Pretty straight forward and is exactly how the existing report viewer works today.

 

Core architecture: the parameter panel itself is a CDF component which defines a layout for all widgets provided. The submit button widget is configured to listen to all CDF components created from any non-hidden parameter. Any time the submit component receives a parameter change event its update() function is called and we check if all parameters are valid. If they are all valid we fire a change event for the parameter “submit” which someone else can listen to and do what they need to do.

Once all components are created we pass them to CDF to initialize them. This will register them CDF and eventually calls update() on all components. It’s this update() that will inject the CDF components into the page.

That being said, the thinking here is that in the future anyone can create a prompting panel from a parameter xml and provide their own callback when the submit button is clicked (or any parameter is changed for that matter).

Extension points:

    • The mapping for widget types to parameter types is done in: parameter-prompting-builders.js in the object: pentaho.common.prompting.builders.ParameterWidgetBuilder.paramTypeToWidgetMapping. That’s the internal widget mapping that we use when looking up a widget type in pentaho.common.prompting.builders.ParameterWidgetBuilder.lookupCDFWidgetBuilder.

 

  • Most of the javascript is structured so it can be extended at any point. Hopefully I provided enough functions that can be overridden to make it easy for a hacker to tear it apart!

 

One of the items I’m expecting to change is the delegation of widget creation to layout panels. I’d like to pass the parameter definition to the layout panel and let it create any widgets it needs instead of creating them ahead of time for each non-hidden parameter. This should be a bit more extensible.

Thank you Jordan for creating this amazing marriage of dashboards and reporting!

Even though this is a prototype build, it proves the point that we can tweak CDF to become the new report viewer.

At this point, Gretchen asked me whether this will have any impact on any existing integration with the reporting plugin. Changing server side URLs is always a bad thing and Gretchen voiced the concerns of all our OEMs and partners who built an existing solution for the reporting plugin.

The changes we propose are purely client side changes. There is no need nor any plan to change server side APIs or change URLs or returned formats or any server side behaviour. You are safe.

As far as I know, this new report viewer will be part of the upcoming Pentaho BI-Server 4.5.

A world of new possibilities

This new report viewer adds a bunch of new possibilities to our reporting system. We can easily extend it, we can add new components and new ways of parametrizing reports. Parameters can look sexy and can be visually rich.

With CDF’s flexibility and ability to style the dashboard in any way you want, we can produce more flexible layouts that match existing corporate style guidelines of our customers. With the ability to quickly integrate new components we solve more business cases.

On top of my head, I can imagine a Google-Maps widget to select locations, clickable charts to select customers or product lines. Our Drill-linking can be used in new ways. Why not use a report or analyzer view to present your selection?

With CDF and the power of JavaScript in our hands we can also easily show or hide parameter as needed, or even produce selective parameter input paths based on the user’s selection.

With better parametrization, our reporting system will look more sexy, which means more people will be willing to use it. More customers is always a good thing.

But to make this bird fly, we need all the help that we can get.

How can you help to get it right from the start?

First and foremost: Give us your requirements. For us from engineering it is hard to know what obstacles you hit in the fields or what your clients ask you to solve. So instead of producing a system that is based on our limited “Steel-wheels” world, I would see an open discussion that comes up with a set of true requirements that match what you see from your customers every day.

If you already use CDF to drive parametrized reports, tell us all about it? What is the problem you are solving here? What extra work did you have to do to make it work for your use case? If you wrote some parameter input that might be useful for other: Would you share it with us?

Help us to expand the parameter definition dialog in the report designer so that we can easily add additional attributes to the report parameters. This way we can prototype faster and you can use this to pass additional configuration settings to the CDF parameter viewer.

And if you cannot give anything, then at least test what we write with your data. The earlier you test, the better we will be able to react to the results. And please, please: Test the early builds as well. If the product is already in RC (Release-Candidate) state then it is very hard to make major architectural changes. Your tests of the early builds help us to know whether we move in the right direction and allow us to correct our course when we are not.

So lets start the discussion here and now

What would you need from a parameter viewer? What requirements did you meet that forced you to implement your own dashboard-parameter-viewer?

Rome: Pentaho Community ante portas

Oh yes! It’s that time of the year again. It’s community time – this year in Frascati, Italy.

As usual its late September, so time to gather again to see what exciting new developments are brewing in the community. From reading the talk announcements, it seems we will have another full scheduled packed with new projects, field reports from the trenches and the usual sharing of ideas, techniques and general chatter.

Surely, that is something that Twitter, IRC and Blogs or a Forum can never do. Random talks to random people that spark random thoughts that ignite random brain cells.

Back in June I wrote a posting on how to create your own report viewer for the Pentaho BI-Server. That article sparked quite some interest. Seems that many people feel the pain of working with a monolithic GWT application. My personal favourite bugger is the 4 minutes compile time each time I make a change. What is yours?

On Saturday I will give an 30 minutes talk about ways to retire the GWT viewer in favour of a better, more lightweight solution. Thanks to Jordan, the talk will contain amazing live code! It will contain XML. It may contain pixies and dragons. It will not contain XUL nor GWT.

Well, as a full-time code wrestler for Pentaho reporting and the Pentaho Report Designer I do not experience the adrenaline rush caused by real world projects. So I’ll be like Mr. Vampire – let me suck you dry for feedback, for things that did work well over the last year, and things that may not work well. And hey, hearing any feedback is good. It shows that all the code written does make a positive difference – not just for our sales team but for real world people with real world problems to solve.

In that way, I do look forward to all the talks. And – yes, I admit it – the Italian wine and Galliano and Grappa!

PRD-3553 – or the low performance of large reports

The funny thing about most bugs is, that for most parts they go completely unnoticed. No one ever hits them or if they do they think its just normal weird behaviour. (Thanks, Crystal Reports & Co for training your former (and our current) users!)

One of these bugs I had in my bucket for the last week was a rather nasty error condition on reports with a large number of rows. Reports like that are usually used either for data exports or for performance tests, as a human user barely reads the first sentences of an e-mail, not to speak of hundreds of pages of dull numbers.

What was the bug?

Reports with large numbers of rows were incredible slow when run as Excel/HTML/RTF/Table-CSV or Table-XML export. The slow down was worse the more rows were in the report.

The original e-mail contained a test report that demonstrated the issue. (Note: Providing me with a replication path tremendously increases your chances of getting a bug-fix fast. You help me, and I’ll help you.) The tests they ran on their machine showed a clearly exponential curve:

# of Rows Time in seconds
5000 16
10000 28
20000 90
50000 360

What caused the bug?


Short version: The iterative processing in the reporting engine was broken and resulted in a increasingly large layout tree. Iterating this tree to calculate the layout gets more and more expensive the larger the tree gets.

Long version:
The Pentaho reporting engine uses an iterative layouting engine to process the report. During the layouting stage we build up a layouting-DOM tree containing the various boxes and content that makes up a report. Our model is heavily inspired by the CSS specification and is optimized towards using as little memory as possible. Boxes get added to the layouter by the reporting (data processing) engine and once the layout reaches a stable intermediate state, the content is printed and the processed boxes get removed from the layout-DOM. This results in a sliding zone of activity on the layout model and allows us to process huge reports with minimal memory footprint. And we don’t even have to swap to disk for this – all processing happens in memory.

To make this model work, we have to track which layout nodes that have been processed already and which nodes could still move around in the final document.

The whole process of tagging finished nodes worked nicely for paginated reports, but failed for flow- and stream-layouted reports. The system never tagged nodes as ready for print, and so the layout tree grew bigger and bigger. At the end, this resulted in larger processing times (as the iteration of the tree took longer) and a huge memory foot-print (as more nodes have to be held in memory).

To fix this problem, I had to fix the “ready-for-process” tagging [SVN].

Once the iterative process sprung to life, the memory foot-print went down, but the processing performance was not as good as I would have expected. In fact, the iterative processing worked so well, that it caused more overhead than it actually saved time. A quick and dirty throttling of the layouter’s processing queue made performance jump up. We now only process every 50th iterative event, and thus trading a bit more memory costs for a huge increase in processing speed.

How do we perform now

The bug fix was a absolute success beyond my wildest dreams. I can say I have witnessed an increase of 800% in report processing speed. (Ok, it is not difficult for this class of bugs: all you need is patience and a large enough report.)

# of Rows Time in seconds
Previous to the fix After the fix Change in %
5000 11 11 0%
10000 24 22 9%
20000 414 47 880%
50000 (crash) 146 (a rather large number, trust me)

When can I get this fix?

The fix will be included in the 3.8.2-GA release and should be built within the next weeks. At the moment, everyone at Pentaho is still busy finalizing the BI-Server 4.0/PRD-3.8.1 release, so it will take a moment before we can fire up another release.

In the meantime, you can either grab the sources from the Subversion repository or grab the CI-build. The 3.8.2 engine is a direct replacement of the 3.8.1 build, so you can patch your system by just copying the final jar-files over. I have not tested whether you can do some similar patching with 3.8 or 3.7 installations.

How to create a fancy tooltip in HTML reports with PRD

When you create reports that are mainly used on the web, you probably want to enrich your reports with some basic interactivity. Charts need links and tooltips on their data, drill downs need to be defined and hopefully the information overload of ordinary reports gets reduced via fancy images, hidden sections that only show up on demand and other techniques.

The most basic way of creating a annotation on a report is to provide tooltips. Sadly the HTML creators were weird scientists who were used to long and boring lists of footnotes instead of in-lined annotations.

Today I am showing you how to create fancy, JavaScript based tooltips onto a report. You can adapt the same technique to create other interactive elements, including Google Maps integrated reports or other Web-2.0 mash-ups.

Rich-Text tooltips: The idea

The tool-tip system I am integrating is based on the blog posting written by Michael Leigeber, a web designer and .NET developer who runs the Leigeber Web Development Blog.

The tool-tips behaviour is defined in a central script that gets included in the report’s generated HTML output. To include the script, we simply copy the whole raw HTML/javascript into the report attribute “html::append-body”. When executing the report, the contents of this attribute are copied into the HTML file before the first table is rendered.

If you want to know how the tooltip script works, please refer to Michael Leigeber’s blog posting.

On each element that we define, we can now define the tooltip as HTML text on the “onmouseover” attribute.

tooltip.show('Testing  123 ', 200);

The tooltip gets hidden when the mouse leaves the element via the “onmouseout” attribute.

tooltip.hide()

The first parameter in the “show” method call is used as ‘innerHTML’ on the generated tooltip element. So you instead of just plain text, you can include any HTML content you like, including images, tables or animations.

Dynamic Tooltips: Show me my data in the tooltip

We all agree that static tooltips are rather boring. A tooltip, well-placed can mean all the difference between a information-overloaded report and a report that shows the exactly the right information you need at exactly the right time. Show the main sales numbers, and move the detail content into the tooltip.

In Pentaho Reporting, most attributes and styles can either contain a static value or can be computed at runtime via an expression or formula. When you see a green plus on the last column of either the style or attribute table, then you will be able to add a calculation for that property.

To make the tooltip show your own data, you will need to make the first parameter of the tooltip.show(..) function call dynamic. The “onmouseover” property expects a string that is valid JavaScript. The reporting engine does not interpret this string at all, it just passes it into the resulting HTML document and lets the browser decide what to do with it.

So all we need to do, is to compute a JavaScript text that contains our data:

="tooltip.show('" & [myfield] & "');"

But careful. If myfield contains any character with special meaning in JavaScript, like quotes, your script is not going to work. To make it work we will need some proper quoting to turn the text into proper JavaScript code.

The Pentaho Reporting Engine offers the “QUOTETEXT” function for this purpose. This function takes two parameter. The first parameter is the text you want to quote, and the second parameter is a constant telling how you want to quote the text: “javascript”, “xml”, “html”, “formula-string” or “formula-reference”.

For now we will need two sets of qouting, “javascript” and “html” as our quoting, and alter the formula above to read:

="tooltip.show('" & QUOTETEXT(QUOTETEXT([myfield];"html");"javascript") & "');"

The inner QUOTETEXT ensures that the text given in [myfield] is proper HTML and that all special HTML characters are encoded properly. So ‘>’ gets converted into >, ‘<' into < and so on. The outer QUOTEXT function then ensures that the resulting text is also proper JavaScript code. It encodes all single and double quote characters and all newlines, tabs and so on into their properly encoded JavaScript counter parts. This is already a working tooltip, even though it is a bit primitive. Earlier on I said, you can use HTML text as tooltip. So lets do a bit of HTML magic here. Again, text in HTML needs to be encoded properly as well.

="tooltip.show('" & QUOTETEXT("

" & QUOTETEXT([PRODUCTNAME];"html") & "


" & QUOTETEXT([PRODUCTDESCRIPTION];"html") & "

";"javascript") & "');"

This produces a tooltip that prints the product code from the steel-wheels example and the product description divided by a horizontal line.

You can grab a sample report for Pentaho Reporting 3.8.0-GA:
fancy-html-tooltip.prpt

What’s New in Pentaho Report Designer 3.8.1 – Bug-Fixes!

With weeks and weeks of doing nothing bug bug-fixes passing by, we finally reached the point where there is a new release of the reporting tools and the BI-Server comes along. So let’s have a look on what changed and why you want to upgrade as fast as possible.

The most exciting new addition for everyone probably is the Pentaho Interactive Reporting service. Pentaho Interactive Reporting replaces the old Web-Based Ad-Hoc Query and Reporting (WAQR) tool. WAQR was one of those things that get the job done, but do so neither gracefully nor with style. But in the 21st century you need something better. There are times where you can’t go for the full featured hardcore-nerdy Report-Designer option.

Pentaho Interactive Reporting is aimed to supplement our reporting solutions. Technologically it stands between Analyzer being a complete Ad-Hoc tool for exploring the OLAP models of the data-warehouses and the Pentaho report designer, a tool for creating standardized reports for printing or non-interactive access to the data.

Apart from opening up some APIs for Pentaho Interactive Reporting, this release only contains bug-fixes. This release contains fixes in the layouter, the parameter handling and the cascading prompt capabilities inside the platform and the report-designer and fixes some missing functionality inside the charting system.

Overall, 63 bugs got squashed in the process.

Pentaho Reporting 3.8.1 is now in the final build process and from my part the work is done now (assuming that the final tests do not come up with any show-stopping bugs). Judging from the last releases, the final build should be uploaded to our public Sourceforge page in a next weeks.


Release Notes – Pentaho Reporting and Pentaho Report Designer – Version 3.8.1 GA (4.0.0 GA Suite Release)

Bug

  • [PRD-114] – Round settings on rectangle element are not being saved.(In HTML)
  • [PRD-2434] – PRD will show only the one column if the same column is being queried in multiples of different aggregates.
  • [PRD-2690] – Selecting a formula-function in the side-bar-list does not show the function’s description in the bottom area of the formula editor.
  • [PRD-2834] – Radar chart not displaying Japanese characters for the legend and graph points
  • [PRD-2924] – Reporting plugin within PUC, date field doesn’t populate with the date chosen with the date picker.
  • [PRD-3243] – On functions attributes, when you use the … button to select a group, for the property “Reset on Group name , and select a group, the value won’t update to the property
  • [PRD-3330] – Removed “ProductLine” custom parameter in Hyperlink definition on the “Product Line Sales.prpt”.
  • [PRD-3354] – Experimental features disabled with CDA datasource on classpath dies with NPE
  • [PRD-3420] – Date format on X-axis in chart is ignored
  • [PRD-3425] – Saving a report with landscape orientation mixes up the left and right margin on load
  • [PRD-3431] – Regression: Sheetname style expression used on group header does not set the first sheet’s sheetname
  • [PRD-3433] – PRD Windows *.bat startup file does not work with latest JRE
  • [PRD-3438] – Horizontal scroll bars do not appear in IE when using custom landscape layout and parameter-layout equal to horizontal
  • [PRD-3453] – Tables need a style definition “table-layout: fixed” to render properly in IE
  • [PRD-3466] – UI Refresh – Reports in the PUC report viewer do not have new scroll bars.
  • [PRD-3467] – UI Refresh – Report filters controls do not have updated style
  • [PRD-3470] – Indexed columns had no proper metadata processing and therefore wizard-label printing broke if the tablemodel did not provide proper metadata
  • [PRD-3474] – Overlapping element in group header on HTML Action report.
  • [PRD-3475] – Report Name for Chart Post Processor is Line it should be “Chart Post Processor”.
  • [PRD-3476] – Radar Chart is display incorrect data for the series. The chart is setup incorrectly.
  • [PRD-3478] – Firefox: Links are not aligning with items in a PRPT’s pie chart (Product Line Sales)
  • [PRD-3479] – Adding element in details band breaks out the Pentaho Report Designer GUI.
  • [PRD-3480] – On functions attributes, the Value’s dropdown list contains empty descriptions
  • [PRD-3487] – I get the following stack trace when using the report wizard in PRD, no end user impact
  • [PRD-3489] – Report Header displays a blue border and the font it not anti-aliased (IE only)
  • [PRD-3490] – Reports are not displayed correctly in IE 7.0 and 8.0 when the Report Parameter pane is minimised
  • [PRD-3494] – XYBar-Charts do not have the stackedBarRenderPercentages option the categorical charts have.
  • [PRD-3495] – Morphing a label into a date field kills the attribute-table when it tries to format the “value” attribute
  • [PRD-3496] – Edit-Groups dialog looses user-input if a cell-editor is still active when the user hits OK
  • [PRD-3497] – LegacyChartEditor looses user data if closed while editing cells.
  • [PRD-3498] – Report Pre-Processors are displayed with their raw classnames in the master-report attribute table
  • [PRD-3499] – As a PIR developer, I want the wizard-core to provide an option to override metadata formatting when applying a user defined format.
  • [PRD-3503] – Chart Editor has additional chart properties that don’t have user-friendly names.
  • [PRD-3507] – Set Italian number format for chart labels
  • [PRD-3512] – Problem launching Report Designer version 3.7 and 3.8 on Windows 7
  • [PRD-3514] – Adding TOC to report causes exception from JDBC driver
  • [PRD-3515] – Url-formula option in the Scatter Plot Chart doesn’t work.
  • [PRD-3516] – Saving a report in PUC via IE gives the filename as “reporting” instead of giving the prpt’s name as filename.
  • [PRD-3519] – Default date format for a text input parameter field works in 3.6 but not in 3.8
  • [PRD-3521] – XY/XYZ charts have no option to display the textual series key in a tooltip or URL
  • [PRD-3522] – Layout Issues: Inventory Report doesn’t look the same in 3.8.1.
  • [PRD-3528] – Sample Report – What’s New – is not correctly grouping the detail items.
  • [PRD-3529] – Block layout in Report Header behaves differently between html stream and html paginated.
  • [PRD-3536] – BarcodeElement with invalid barcode input bombs the layout and the report processing
  • [PRD-3537] – After changing the orientation from portrait to landscape, the right and left margins get reversed.
  • [PRD-3540] – Cascading prompts with server side calculated default values are not working
  • [PRD-3541] – A plain text-field parameter with a data-source attached to it resets unknown values to even if the parameter is not strictly validating.
  • [PRD-3542] – Editing a metadata query containing the same columns with differ aggregation types, the query aggregation types default to MIN.
  • [PRD-3549] – Character spacing on report elements not working on PDF outputs.
  • [PRD-2422] – When editing query with a parameter that is a date in the MQL Editor, the editor updates the a parameter, {date_parameter} to DATEVALUE([date_parameter]) and the default value can’t be set.
  • [PRD-3349] – Request for patch to Report Designer 3.7 to fix number truncation problem
  • [PRD-3365] – Column-by-index fields are not available if the data comes from the query-cache
  • [PRD-3373] – Random incorrect value displayed in PDF output format
  • [PRD-3375] – Round up/down error in Report Designer
  • [PRD-3396] – Problem displaying different format-string in the same column on a number field
  • [PRD-3422] – Moving the focus widget out of the parameter panel
  • [PRD-3448] – Excel chart output is blurred
  • [PRD-3451] – As a reporting user I want to use a default-value formula to get default values for cascading parameters when the parent parameter changes
  • [PRD-3454] – On Open dialog, update Report Definition to include the .prpti file type.
  • [PRD-3455] – On Open from Repository dialog, update Report Definition to include the .prpti file type.
  • [PRD-3465] – Kettle datasource does not work correctly with datasource-cache enabled as it does not take the query name into account
  • [PRD-3473] – Label Text “Group n Footer” is displaying in a report generate by the Report Wizard using the new templates.
  • [PRD-3411] – Issue with Cascading Prompts

Improvement

  • [PRD-3426] – Need the ability to set the padding on the detail elements on templates
  • [PRD-3432] – Width of Inherit column is too wide leaving less space for property name
  • [PRD-3525] – Update default Chart Options to be consistent with Analyzer and Dashboards
  • [PRD-3547] – Opent Dashboards (.xdash) via hyperlinks into new tab
  • [PRD-3452] – As a reporting user I want to open PRPTI files from within PRD
  • [PRD-3469] – Add new templates and remove the old ones.

Don’t hardcode host names, use JNDI

I got a e-mail earlier this week, asking for some help in migrating reports.

Hey Thomas,

I have a poop load of reports with hard coded 
ip addresses (yeah I know – but they didn't 
have DNS entries until now) that I'd like to 
globally change.  I'm thinking of something 
along the lines of "zcat | sed | gzip" but 
something like this would only work on 
compressed files, not an archive of files.  

So – I was wondering if you had any ideas, 
maybe even a utility based on code we already 
have to create and open these bad boys.  
We could even have some known locations to 
modify 

Like>  prpt-replace –dbConnectionHost "192.168.1.100" "mydb.mycorporation.com"

Or maybe use file and xpath to denote the 
attribute to change.

(The names involved were changed to protect the guilty.)

Well, hard-coding database connection information in every report is never a good idea in no environment. If your database host or credentials changes you are truly and well .. in trouble.

So how can you avoid this trouble in the first place?

Use JDNI to hold database connections in a external place

JDNI is a technique to hold database connections outside of the report definition. By holding this information in a central spot, you only have to change one place and all your reports get updated. The JNDI stored connection information is stored on the computer where your report runs. This also allows you to have separate definitions based on where the report runs. Your report designer can access a local copy of your production database, while your production server uses the real database server.

As a side note: On the BI-Server, this system of abstracting database information is called “Pentaho Datasources”, not JNDI. In this posting, I will forgo the branding exercise and simply continue to call it JNDI.

You can use JNDI in the following datasources:

  • SQL
  • Pentaho Analytics (Mondrian)
  • OLAP4J
  • Pentaho Metadata

For all of your production reports, you should always choose JNDI defined datasources. Manually defined datasources are great for testing out ideas, but really a nightmare to maintain and by storing the database credentials they potentially pose a security risk if you choose to share report definitions with others.

JDBC Connection definition basics

A database connection in Java requires four mandatory pieces of information:

  • driver: The name of the driver JDBC class.
  • url: The JDBC URL. This URL contains all information needed to connect to the database like the host name and port. The format of this URL is specific to a particular database. Consult your database manual for details on your particular database.
  • user: The username for the database.
  • password: The password for the accessing the database.

In addition to this, each driver allows you to specify additional properties. Again, these properties are specific to the database and (hopefully) documented in the manual. Most databases alternatively allow you to encode the properties onto the URL.

Defining data-sources in the Pentaho Report Designer

Within the report designer, there is no user interface to define the JNDI connections. The connections are defined in a file called "$HOME/.pentaho/simple-jndi/default.properties".

The file looks somewhat like this:

# Copyright 2008 Pentaho Corporation.  All rights reserved. 
# This software was developed by Pentaho Corporation and is provided under the terms 
# of the Mozilla Public License, Version 1.1, or any later version. You may not use 
# this file except in compliance with the license. If you need a copy of the license, 
# please go to http://www.mozilla.org/MPL/MPL-1.1.txt. The Original Code is the Pentaho 
# BI Platform.  The Initial Developer is Pentaho Corporation.
#
# Software distributed under the Mozilla Public License is distributed on an "AS IS" 
# basis, WITHOUT WARRANTY OF ANY KIND, either express or  implied. Please refer to 
# the license for the specific language governing your rights and limitations.
SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/user=pentaho_user
SampleData/password=password
SampleData/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata

SampleDataAdmin/type=javax.sql.DataSource
SampleDataAdmin/driver=org.hsqldb.jdbcDriver
SampleDataAdmin/user=pentaho_admin
SampleDataAdmin/password=password
SampleDataAdmin/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleDataAdmin/url=jdbc:hsqldb:hsql://localhost/sampledata

The JDNI definition file holds all connections known to the local JNDI system. This system has no connection to the BI-Server and does not access any connections defined in the Pentaho Administration Console.

A JNDI database definition in the report designer is defined by a set of properties. Each property name has two parts: The name of the data-source (‘SampleData’ and ‘SampleDataAdmin’ in the example above) and the JDNI property name itself. Each datasource declared in the “simple-jndi” definition file requires the following properties:

  • Mandatory “type” entry set to the value “javax.sql.DataSource”
  • Driver class name
  • Connection URL
  • A username and password
  • .. and any additional property you want to pass in

The simplest way of creating a new definition entry is to copy an existing connection entry and to alter the name of the entry (the first part of the properties) and then to replace the connection information with your own details.

All JNDI connection names are case sensitive, so inside the report you have to use the same spelling for the name as in this properties-file.

And last but not least: Defining your connection information in the JNDI system does not relieve you from providing suitable database drivers. You can install new drivers by copying the jar files (and if needed any native libraries) into the “lib/jndi” directory of the Pentaho Report Designer.

Defining a JDNI connection in the Pentaho Administration Console

For the Pentaho BI-Server, the Pentaho architects chose not to use the JDNI subsystem that ships with all J2EE application servers. Sadly on the application servers JNDI has a few implementation flaws:

  • Redefining a JNDI connection requires you to redeploy or restart the application. This is a nightmare for a mission critical server.
  • Each server system comes with its own way to define JNDI definitions. Each of these ways is inherently incompatible with any other vendor.
  • You are bound to the implementations provided by the server vendor. There is little hope to customize the database access for additional capabilities like security checks or multi-tenancy.

Before you can setup a connection, you have to install the correct JDBC driver into both the BI-Server and the Pentaho Administration Console. For that copy the jar files into the following directories:

  • Pentaho BI Server: pentaho/tomcat/webapps/pentaho/WEB-INF/lib
  • Pentaho Administration Console: administration-console/jdbc

To define the connection, start the Pentaho Administration Console. The console starts up on port 8099, and can be reached via the URL “http://localhost:8099/”. The user and password are “admin” and “password” (sadly not documented anywhere in the download). The console can only be run successfully if the BI-Server is up and running as well.

To define a datasource, activate the “Administration” mode on the left hand side of the window and switch to the “DataSources” tab. Hit the small white “+” to add a connection.

The Pentaho Administration Console is smart enough to detect all installed drivers. If your driver is not listed here, you probably have forgotten to add it to the “lib” directories outlined above.

The data-source names you define here must match the datasource names defined for the Pentaho Report Designer. I also heavily recommend that you use the same type of database system for designing reports and for running them in production. SQL is a widely interpretable standard and therefore SQL written for one database is not necessarily compatible with the SQL dialect understood by the others.

Let me close with: Death to all hard-coded settings, long lives the host provided setting!