Monthly Archives: August 2011

PRD-3553 – or the low performance of large reports

The funny thing about most bugs is, that for most parts they go completely unnoticed. No one ever hits them or if they do they think its just normal weird behaviour. (Thanks, Crystal Reports & Co for training your former (and our current) users!)

One of these bugs I had in my bucket for the last week was a rather nasty error condition on reports with a large number of rows. Reports like that are usually used either for data exports or for performance tests, as a human user barely reads the first sentences of an e-mail, not to speak of hundreds of pages of dull numbers.

What was the bug?

Reports with large numbers of rows were incredible slow when run as Excel/HTML/RTF/Table-CSV or Table-XML export. The slow down was worse the more rows were in the report.

The original e-mail contained a test report that demonstrated the issue. (Note: Providing me with a replication path tremendously increases your chances of getting a bug-fix fast. You help me, and I’ll help you.) The tests they ran on their machine showed a clearly exponential curve:

# of Rows Time in seconds
5000 16
10000 28
20000 90
50000 360

What caused the bug?


Short version: The iterative processing in the reporting engine was broken and resulted in a increasingly large layout tree. Iterating this tree to calculate the layout gets more and more expensive the larger the tree gets.

Long version:
The Pentaho reporting engine uses an iterative layouting engine to process the report. During the layouting stage we build up a layouting-DOM tree containing the various boxes and content that makes up a report. Our model is heavily inspired by the CSS specification and is optimized towards using as little memory as possible. Boxes get added to the layouter by the reporting (data processing) engine and once the layout reaches a stable intermediate state, the content is printed and the processed boxes get removed from the layout-DOM. This results in a sliding zone of activity on the layout model and allows us to process huge reports with minimal memory footprint. And we don’t even have to swap to disk for this – all processing happens in memory.

To make this model work, we have to track which layout nodes that have been processed already and which nodes could still move around in the final document.

The whole process of tagging finished nodes worked nicely for paginated reports, but failed for flow- and stream-layouted reports. The system never tagged nodes as ready for print, and so the layout tree grew bigger and bigger. At the end, this resulted in larger processing times (as the iteration of the tree took longer) and a huge memory foot-print (as more nodes have to be held in memory).

To fix this problem, I had to fix the “ready-for-process” tagging [SVN].

Once the iterative process sprung to life, the memory foot-print went down, but the processing performance was not as good as I would have expected. In fact, the iterative processing worked so well, that it caused more overhead than it actually saved time. A quick and dirty throttling of the layouter’s processing queue made performance jump up. We now only process every 50th iterative event, and thus trading a bit more memory costs for a huge increase in processing speed.

How do we perform now

The bug fix was a absolute success beyond my wildest dreams. I can say I have witnessed an increase of 800% in report processing speed. (Ok, it is not difficult for this class of bugs: all you need is patience and a large enough report.)

# of Rows Time in seconds
Previous to the fix After the fix Change in %
5000 11 11 0%
10000 24 22 9%
20000 414 47 880%
50000 (crash) 146 (a rather large number, trust me)

When can I get this fix?

The fix will be included in the 3.8.2-GA release and should be built within the next weeks. At the moment, everyone at Pentaho is still busy finalizing the BI-Server 4.0/PRD-3.8.1 release, so it will take a moment before we can fire up another release.

In the meantime, you can either grab the sources from the Subversion repository or grab the CI-build. The 3.8.2 engine is a direct replacement of the 3.8.1 build, so you can patch your system by just copying the final jar-files over. I have not tested whether you can do some similar patching with 3.8 or 3.7 installations.

How to create a fancy tooltip in HTML reports with PRD

When you create reports that are mainly used on the web, you probably want to enrich your reports with some basic interactivity. Charts need links and tooltips on their data, drill downs need to be defined and hopefully the information overload of ordinary reports gets reduced via fancy images, hidden sections that only show up on demand and other techniques.

The most basic way of creating a annotation on a report is to provide tooltips. Sadly the HTML creators were weird scientists who were used to long and boring lists of footnotes instead of in-lined annotations.

Today I am showing you how to create fancy, JavaScript based tooltips onto a report. You can adapt the same technique to create other interactive elements, including Google Maps integrated reports or other Web-2.0 mash-ups.

Rich-Text tooltips: The idea

The tool-tip system I am integrating is based on the blog posting written by Michael Leigeber, a web designer and .NET developer who runs the Leigeber Web Development Blog.

The tool-tips behaviour is defined in a central script that gets included in the report’s generated HTML output. To include the script, we simply copy the whole raw HTML/javascript into the report attribute “html::append-body”. When executing the report, the contents of this attribute are copied into the HTML file before the first table is rendered.

If you want to know how the tooltip script works, please refer to Michael Leigeber’s blog posting.

On each element that we define, we can now define the tooltip as HTML text on the “onmouseover” attribute.

tooltip.show('Testing  123 ', 200);

The tooltip gets hidden when the mouse leaves the element via the “onmouseout” attribute.

tooltip.hide()

The first parameter in the “show” method call is used as ‘innerHTML’ on the generated tooltip element. So you instead of just plain text, you can include any HTML content you like, including images, tables or animations.

Dynamic Tooltips: Show me my data in the tooltip

We all agree that static tooltips are rather boring. A tooltip, well-placed can mean all the difference between a information-overloaded report and a report that shows the exactly the right information you need at exactly the right time. Show the main sales numbers, and move the detail content into the tooltip.

In Pentaho Reporting, most attributes and styles can either contain a static value or can be computed at runtime via an expression or formula. When you see a green plus on the last column of either the style or attribute table, then you will be able to add a calculation for that property.

To make the tooltip show your own data, you will need to make the first parameter of the tooltip.show(..) function call dynamic. The “onmouseover” property expects a string that is valid JavaScript. The reporting engine does not interpret this string at all, it just passes it into the resulting HTML document and lets the browser decide what to do with it.

So all we need to do, is to compute a JavaScript text that contains our data:

="tooltip.show('" & [myfield] & "');"

But careful. If myfield contains any character with special meaning in JavaScript, like quotes, your script is not going to work. To make it work we will need some proper quoting to turn the text into proper JavaScript code.

The Pentaho Reporting Engine offers the “QUOTETEXT” function for this purpose. This function takes two parameter. The first parameter is the text you want to quote, and the second parameter is a constant telling how you want to quote the text: “javascript”, “xml”, “html”, “formula-string” or “formula-reference”.

For now we will need two sets of qouting, “javascript” and “html” as our quoting, and alter the formula above to read:

="tooltip.show('" & QUOTETEXT(QUOTETEXT([myfield];"html");"javascript") & "');"

The inner QUOTETEXT ensures that the text given in [myfield] is proper HTML and that all special HTML characters are encoded properly. So ‘>’ gets converted into >, ‘<' into < and so on. The outer QUOTEXT function then ensures that the resulting text is also proper JavaScript code. It encodes all single and double quote characters and all newlines, tabs and so on into their properly encoded JavaScript counter parts.

This is already a working tooltip, even though it is a bit primitive. Earlier on I said, you can use HTML text as tooltip. So lets do a bit of HTML magic here. Again, text in HTML needs to be encoded properly as well.

="tooltip.show('" & QUOTETEXT("

" & QUOTETEXT([PRODUCTNAME];"html") & "


" & QUOTETEXT([PRODUCTDESCRIPTION];"html") & "

";"javascript") & "');"

This produces a tooltip that prints the product code from the steel-wheels example and the product description divided by a horizontal line.

You can grab a sample report for Pentaho Reporting 3.8.0-GA:
fancy-html-tooltip.prpt

What’s New in Pentaho Report Designer 3.8.1 – Bug-Fixes!

With weeks and weeks of doing nothing bug bug-fixes passing by, we finally reached the point where there is a new release of the reporting tools and the BI-Server comes along. So let’s have a look on what changed and why you want to upgrade as fast as possible.

The most exciting new addition for everyone probably is the Pentaho Interactive Reporting service. Pentaho Interactive Reporting replaces the old Web-Based Ad-Hoc Query and Reporting (WAQR) tool. WAQR was one of those things that get the job done, but do so neither gracefully nor with style. But in the 21st century you need something better. There are times where you can’t go for the full featured hardcore-nerdy Report-Designer option.

Pentaho Interactive Reporting is aimed to supplement our reporting solutions. Technologically it stands between Analyzer being a complete Ad-Hoc tool for exploring the OLAP models of the data-warehouses and the Pentaho report designer, a tool for creating standardized reports for printing or non-interactive access to the data.

Apart from opening up some APIs for Pentaho Interactive Reporting, this release only contains bug-fixes. This release contains fixes in the layouter, the parameter handling and the cascading prompt capabilities inside the platform and the report-designer and fixes some missing functionality inside the charting system.

Overall, 63 bugs got squashed in the process.

Pentaho Reporting 3.8.1 is now in the final build process and from my part the work is done now (assuming that the final tests do not come up with any show-stopping bugs). Judging from the last releases, the final build should be uploaded to our public Sourceforge page in a next weeks.


Release Notes – Pentaho Reporting and Pentaho Report Designer – Version 3.8.1 GA (4.0.0 GA Suite Release)

Bug

  • [PRD-114] – Round settings on rectangle element are not being saved.(In HTML)
  • [PRD-2434] – PRD will show only the one column if the same column is being queried in multiples of different aggregates.
  • [PRD-2690] – Selecting a formula-function in the side-bar-list does not show the function’s description in the bottom area of the formula editor.
  • [PRD-2834] – Radar chart not displaying Japanese characters for the legend and graph points
  • [PRD-2924] – Reporting plugin within PUC, date field doesn’t populate with the date chosen with the date picker.
  • [PRD-3243] – On functions attributes, when you use the … button to select a group, for the property “Reset on Group name , and select a group, the value won’t update to the property
  • [PRD-3330] – Removed “ProductLine” custom parameter in Hyperlink definition on the “Product Line Sales.prpt”.
  • [PRD-3354] – Experimental features disabled with CDA datasource on classpath dies with NPE
  • [PRD-3420] – Date format on X-axis in chart is ignored
  • [PRD-3425] – Saving a report with landscape orientation mixes up the left and right margin on load
  • [PRD-3431] – Regression: Sheetname style expression used on group header does not set the first sheet’s sheetname
  • [PRD-3433] – PRD Windows *.bat startup file does not work with latest JRE
  • [PRD-3438] – Horizontal scroll bars do not appear in IE when using custom landscape layout and parameter-layout equal to horizontal
  • [PRD-3453] – Tables need a style definition “table-layout: fixed” to render properly in IE
  • [PRD-3466] – UI Refresh – Reports in the PUC report viewer do not have new scroll bars.
  • [PRD-3467] – UI Refresh – Report filters controls do not have updated style
  • [PRD-3470] – Indexed columns had no proper metadata processing and therefore wizard-label printing broke if the tablemodel did not provide proper metadata
  • [PRD-3474] – Overlapping element in group header on HTML Action report.
  • [PRD-3475] – Report Name for Chart Post Processor is Line it should be “Chart Post Processor”.
  • [PRD-3476] – Radar Chart is display incorrect data for the series. The chart is setup incorrectly.
  • [PRD-3478] – Firefox: Links are not aligning with items in a PRPT’s pie chart (Product Line Sales)
  • [PRD-3479] – Adding element in details band breaks out the Pentaho Report Designer GUI.
  • [PRD-3480] – On functions attributes, the Value’s dropdown list contains empty descriptions
  • [PRD-3487] – I get the following stack trace when using the report wizard in PRD, no end user impact
  • [PRD-3489] – Report Header displays a blue border and the font it not anti-aliased (IE only)
  • [PRD-3490] – Reports are not displayed correctly in IE 7.0 and 8.0 when the Report Parameter pane is minimised
  • [PRD-3494] – XYBar-Charts do not have the stackedBarRenderPercentages option the categorical charts have.
  • [PRD-3495] – Morphing a label into a date field kills the attribute-table when it tries to format the “value” attribute
  • [PRD-3496] – Edit-Groups dialog looses user-input if a cell-editor is still active when the user hits OK
  • [PRD-3497] – LegacyChartEditor looses user data if closed while editing cells.
  • [PRD-3498] – Report Pre-Processors are displayed with their raw classnames in the master-report attribute table
  • [PRD-3499] – As a PIR developer, I want the wizard-core to provide an option to override metadata formatting when applying a user defined format.
  • [PRD-3503] – Chart Editor has additional chart properties that don’t have user-friendly names.
  • [PRD-3507] – Set Italian number format for chart labels
  • [PRD-3512] – Problem launching Report Designer version 3.7 and 3.8 on Windows 7
  • [PRD-3514] – Adding TOC to report causes exception from JDBC driver
  • [PRD-3515] – Url-formula option in the Scatter Plot Chart doesn’t work.
  • [PRD-3516] – Saving a report in PUC via IE gives the filename as “reporting” instead of giving the prpt’s name as filename.
  • [PRD-3519] – Default date format for a text input parameter field works in 3.6 but not in 3.8
  • [PRD-3521] – XY/XYZ charts have no option to display the textual series key in a tooltip or URL
  • [PRD-3522] – Layout Issues: Inventory Report doesn’t look the same in 3.8.1.
  • [PRD-3528] – Sample Report – What’s New – is not correctly grouping the detail items.
  • [PRD-3529] – Block layout in Report Header behaves differently between html stream and html paginated.
  • [PRD-3536] – BarcodeElement with invalid barcode input bombs the layout and the report processing
  • [PRD-3537] – After changing the orientation from portrait to landscape, the right and left margins get reversed.
  • [PRD-3540] – Cascading prompts with server side calculated default values are not working
  • [PRD-3541] – A plain text-field parameter with a data-source attached to it resets unknown values to even if the parameter is not strictly validating.
  • [PRD-3542] – Editing a metadata query containing the same columns with differ aggregation types, the query aggregation types default to MIN.
  • [PRD-3549] – Character spacing on report elements not working on PDF outputs.
  • [PRD-2422] – When editing query with a parameter that is a date in the MQL Editor, the editor updates the a parameter, {date_parameter} to DATEVALUE([date_parameter]) and the default value can’t be set.
  • [PRD-3349] – Request for patch to Report Designer 3.7 to fix number truncation problem
  • [PRD-3365] – Column-by-index fields are not available if the data comes from the query-cache
  • [PRD-3373] – Random incorrect value displayed in PDF output format
  • [PRD-3375] – Round up/down error in Report Designer
  • [PRD-3396] – Problem displaying different format-string in the same column on a number field
  • [PRD-3422] – Moving the focus widget out of the parameter panel
  • [PRD-3448] – Excel chart output is blurred
  • [PRD-3451] – As a reporting user I want to use a default-value formula to get default values for cascading parameters when the parent parameter changes
  • [PRD-3454] – On Open dialog, update Report Definition to include the .prpti file type.
  • [PRD-3455] – On Open from Repository dialog, update Report Definition to include the .prpti file type.
  • [PRD-3465] – Kettle datasource does not work correctly with datasource-cache enabled as it does not take the query name into account
  • [PRD-3473] – Label Text “Group n Footer” is displaying in a report generate by the Report Wizard using the new templates.
  • [PRD-3411] – Issue with Cascading Prompts

Improvement

  • [PRD-3426] – Need the ability to set the padding on the detail elements on templates
  • [PRD-3432] – Width of Inherit column is too wide leaving less space for property name
  • [PRD-3525] – Update default Chart Options to be consistent with Analyzer and Dashboards
  • [PRD-3547] – Opent Dashboards (.xdash) via hyperlinks into new tab
  • [PRD-3452] – As a reporting user I want to open PRPTI files from within PRD
  • [PRD-3469] – Add new templates and remove the old ones.

Don’t hardcode host names, use JNDI

I got a e-mail earlier this week, asking for some help in migrating reports.

Hey Thomas,

I have a poop load of reports with hard coded 
ip addresses (yeah I know – but they didn't 
have DNS entries until now) that I'd like to 
globally change.  I'm thinking of something 
along the lines of "zcat | sed | gzip" but 
something like this would only work on 
compressed files, not an archive of files.  

So – I was wondering if you had any ideas, 
maybe even a utility based on code we already 
have to create and open these bad boys.  
We could even have some known locations to 
modify 

Like>  prpt-replace –dbConnectionHost "192.168.1.100" "mydb.mycorporation.com"

Or maybe use file and xpath to denote the 
attribute to change.

(The names involved were changed to protect the guilty.)

Well, hard-coding database connection information in every report is never a good idea in no environment. If your database host or credentials changes you are truly and well .. in trouble.

So how can you avoid this trouble in the first place?

Use JDNI to hold database connections in a external place

JDNI is a technique to hold database connections outside of the report definition. By holding this information in a central spot, you only have to change one place and all your reports get updated. The JNDI stored connection information is stored on the computer where your report runs. This also allows you to have separate definitions based on where the report runs. Your report designer can access a local copy of your production database, while your production server uses the real database server.

As a side note: On the BI-Server, this system of abstracting database information is called “Pentaho Datasources”, not JNDI. In this posting, I will forgo the branding exercise and simply continue to call it JNDI.

You can use JNDI in the following datasources:

  • SQL
  • Pentaho Analytics (Mondrian)
  • OLAP4J
  • Pentaho Metadata

For all of your production reports, you should always choose JNDI defined datasources. Manually defined datasources are great for testing out ideas, but really a nightmare to maintain and by storing the database credentials they potentially pose a security risk if you choose to share report definitions with others.

JDBC Connection definition basics

A database connection in Java requires four mandatory pieces of information:

  • driver: The name of the driver JDBC class.
  • url: The JDBC URL. This URL contains all information needed to connect to the database like the host name and port. The format of this URL is specific to a particular database. Consult your database manual for details on your particular database.
  • user: The username for the database.
  • password: The password for the accessing the database.

In addition to this, each driver allows you to specify additional properties. Again, these properties are specific to the database and (hopefully) documented in the manual. Most databases alternatively allow you to encode the properties onto the URL.

Defining data-sources in the Pentaho Report Designer

Within the report designer, there is no user interface to define the JNDI connections. The connections are defined in a file called "$HOME/.pentaho/simple-jndi/default.properties".

The file looks somewhat like this:

# Copyright 2008 Pentaho Corporation.  All rights reserved. 
# This software was developed by Pentaho Corporation and is provided under the terms 
# of the Mozilla Public License, Version 1.1, or any later version. You may not use 
# this file except in compliance with the license. If you need a copy of the license, 
# please go to http://www.mozilla.org/MPL/MPL-1.1.txt. The Original Code is the Pentaho 
# BI Platform.  The Initial Developer is Pentaho Corporation.
#
# Software distributed under the Mozilla Public License is distributed on an "AS IS" 
# basis, WITHOUT WARRANTY OF ANY KIND, either express or  implied. Please refer to 
# the license for the specific language governing your rights and limitations.
SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/user=pentaho_user
SampleData/password=password
SampleData/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata

SampleDataAdmin/type=javax.sql.DataSource
SampleDataAdmin/driver=org.hsqldb.jdbcDriver
SampleDataAdmin/user=pentaho_admin
SampleDataAdmin/password=password
SampleDataAdmin/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleDataAdmin/url=jdbc:hsqldb:hsql://localhost/sampledata

The JDNI definition file holds all connections known to the local JNDI system. This system has no connection to the BI-Server and does not access any connections defined in the Pentaho Administration Console.

A JNDI database definition in the report designer is defined by a set of properties. Each property name has two parts: The name of the data-source (‘SampleData’ and ‘SampleDataAdmin’ in the example above) and the JDNI property name itself. Each datasource declared in the “simple-jndi” definition file requires the following properties:

  • Mandatory “type” entry set to the value “javax.sql.DataSource”
  • Driver class name
  • Connection URL
  • A username and password
  • .. and any additional property you want to pass in

The simplest way of creating a new definition entry is to copy an existing connection entry and to alter the name of the entry (the first part of the properties) and then to replace the connection information with your own details.

All JNDI connection names are case sensitive, so inside the report you have to use the same spelling for the name as in this properties-file.

And last but not least: Defining your connection information in the JNDI system does not relieve you from providing suitable database drivers. You can install new drivers by copying the jar files (and if needed any native libraries) into the “lib/jndi” directory of the Pentaho Report Designer.

Defining a JDNI connection in the Pentaho Administration Console

For the Pentaho BI-Server, the Pentaho architects chose not to use the JDNI subsystem that ships with all J2EE application servers. Sadly on the application servers JNDI has a few implementation flaws:

  • Redefining a JNDI connection requires you to redeploy or restart the application. This is a nightmare for a mission critical server.
  • Each server system comes with its own way to define JNDI definitions. Each of these ways is inherently incompatible with any other vendor.
  • You are bound to the implementations provided by the server vendor. There is little hope to customize the database access for additional capabilities like security checks or multi-tenancy.

Before you can setup a connection, you have to install the correct JDBC driver into both the BI-Server and the Pentaho Administration Console. For that copy the jar files into the following directories:

  • Pentaho BI Server: pentaho/tomcat/webapps/pentaho/WEB-INF/lib
  • Pentaho Administration Console: administration-console/jdbc

To define the connection, start the Pentaho Administration Console. The console starts up on port 8099, and can be reached via the URL “http://localhost:8099/”. The user and password are “admin” and “password” (sadly not documented anywhere in the download). The console can only be run successfully if the BI-Server is up and running as well.

To define a datasource, activate the “Administration” mode on the left hand side of the window and switch to the “DataSources” tab. Hit the small white “+” to add a connection.

The Pentaho Administration Console is smart enough to detect all installed drivers. If your driver is not listed here, you probably have forgotten to add it to the “lib” directories outlined above.

The data-source names you define here must match the datasource names defined for the Pentaho Report Designer. I also heavily recommend that you use the same type of database system for designing reports and for running them in production. SQL is a widely interpretable standard and therefore SQL written for one database is not necessarily compatible with the SQL dialect understood by the others.

Let me close with: Death to all hard-coded settings, long lives the host provided setting!