Category Archives: Tech-Tips

Unix: Tell your system how to treat PRPT files as ZIP files

As a BI-Solution developer, every now and then there comes a time to bring out the serious tools. When dealing with report files, you may find that you need to peek inside the ZIP structure, to trace down an error, or to simply look at some of its metadata quickly.

But unzipping manually is tedious. Luckily, on every Linux system (and on Cygwin), the Midnight Commander makes file operations on the console a lot easier. The MC treats ZIP files as a virtual file system, and allows you to browse them like subdirectories and makes it possible to add, edit or remove files from these archives easily.

The only problem is: It only works if the Midnight Commander recognizes the file as a ZIP file. And sadly, for PRPTs it does not.

Internally the Midnight Commander relies on the ‘file’ command, which relies on the ‘magic’ database to check files against known signatures. So lets teach MC how to play nice with PRPTs.

Create a $HOME/.magic file with your favourite text editor, and add the following content:

# ZIP archives (Greg Roelofs, c/o zip-bugs@wkuvx1.wku.edu)
0    string        PK\003\004
>30    ubelong        !0x6d696d65
>>4    byte        0x00        Zip archive data
!:mime    application/zip
>>4    byte        0x09        Zip archive data, at least v0.9 to extract
!:mime    application/zip
>>4    byte        0x0a        Zip archive data, at least v1.0 to extract
!:mime    application/zip
>>4    byte        0x0b        Zip archive data, at least v1.1 to extract
!:mime    application/zip
>>0x161    string        WINZIP          Zip archive data, WinZIP self-extracting
!:mime    application/zip
>>4    byte        0x14        Zip archive data, at least v2.0 to extract
!:mime    application/zip

>30    string        mimetype
>>50    string    vnd.pentaho.        Zip archive data, Pentaho Reporting

Then all you need to do is compile the file, and you are ready to go:

file -C -m $HOME/.magic

This creates a $HOME/.magic.mgc file, which contains the compiled magic code.
Test it via

file a_test_report.prpt

and if it says, “Zip archive data, Pentaho Reporting” your system has learned how to treat PRPTs as ZIPs.

Anonymized reports – Report bugs without exposing your business

When reporting a tricky bug, it is mandatory for us to get a sample of a report that shows the bad behaviour. The worst bugs are bugs that consistently show up at the customer’s system but refuse to be seen on my own computer. The best bug-reports are reports that contain a small sample report along with the data necessary to show the bug’s effects.

However, not everyone is willing or allowed to share sensitive data. If the bug occurs in your HR reports, its probably not the best idea to attach the report and data to a public JIRA case. When you live in the EU, disclosing personal data of non-consenting persons is a rather serious act.

With Pentaho Reporting 4.0 creating good bug-reports finally becomes easier.

Select “Extras->Anonymize Report” and your report’s text goes through a randomization process. All characters in the labels are replaced by a randomly chosen character, while preserving both word length, punctuation and capitalization.

The select the report’s query and select “Extras->Anonymize Query” and your data undergoes the same process. Numbers stay numbers, but are replaced by randomly chosen numbers of the same magnitude. Text and dates are scrambled too. Once this is finished, remove your old query from your data-source and your report now uses the new query.

Note that the query-anonymization works on the preview data. If your query has parameters the preview usually does not return data without proper parameter values. In that case you need to rewrite your query to remove the parameters before you can anonymize it.

With this selective process we preserve most of the characteristics of the report that are important for the layout calculation, but remove most of the sensitive data that was contained in the report.

PRD-4090: Data-Factories calling other Data-Factories

DataFactories now have a changed “initialize(..)” method, which
takes a single “DataFactoryContext” object.

The context contains the known init-properties (configuration,
resource manager, resource bundle factory) – nothing new here.
In addition to that, there is now also a property called
‘contextDataFactory’, which is the data-factory of your current
report.

Runtime use:
————

Calling initialize is now mandatory for all datasources. If you
get strange exceptions from the class
“AbstractDataFactory”, then you forgot to call initialize(..).

The data-factory you get from the context is already initialized,
so you can happily start using it. Please do not make any assumptions
on the contents of the data-factory. Calling queries is fine, but
trying to dismantle it to get to the underlying implementation is
not.

Design-time:
————

I steamlined the designtime code a bit. When you need to preview
a data-factory, you can now get a properly initialized data-factory via:

org.pentaho.reporting.engine.classic.core.designtime.datafactory.DataFactoryEditorSupport#configureDataFactoryForPreview

The DataSourcePlugin interface now has a new parameter on its
performEdit method. The “DataFactoryChangeRecorder” accepts
“DataFactoryChange” objects from your editor.

As a data-factory editor, you should not modify the report-definition directly. Return your changed data-factory via the method return
value. If you edit existing data-factories, record the original,
unmodified data-factory and the changed data-factory in the
ChangeRecorder.

The report-designer or report-design-wizard will take that change
information and incorporate it into the report. Letting the caller
have control over that allows us to record Undo information in PRD.

(Note: Undo recording is not implemented yet, as I ran out of time
here. For now, we just add/replace/delete data-factories without
creating undo-objects. We will have that fixed before we go GA,
simply work as if it is there for now.)

 

Linking to a report with the same export type

When you create reports connected with each other by links, you want to stay in the same output mode as your source report. When viewing a PDF report, you want the linked report to be PDF too, for instance.

So how would you do that in Pentaho Reporting?

(1) You need to know your current export type.

When you export a report, each output type has a unique identifier similar to this one “pagable/pdf”. The identifier consists of

(a) the basic export type:
  * pagable for paginated reports or
  * table for reports exported as layout-tables
and
(b) the content type
  * pdf for PDF export
  * html for HTML
  * .. and so on.

The BI-Server uses the same identifiers in the reporting plugin to select the correct output target for your reports. The parameter for this is called “output-target” and is documented in the Pentaho Wiki.

You probably know about the “ISEXPORTTYPE” function. This formula function allows you to test for a specific output target when the report runs. To get the export type you now could write a ugly long formula with many nested IF functions.

Or you can use this small BSH-Function instead:

Object getValue()
{
  return runtime.getExportDescriptor();
}

to get the export descriptor string directly.

(2) You need to feed this export identifier into your links.

Use the Drill-Linking functionality to add a manual parameter to your link. Name this parameter “output-target” and link this to your BSH-Function. If your function is named “ExportType”, then you would write “=[ExportType]” into the value field.

 
With that connection made, your reports will now be linked with the same output-target as the source report. Set the “Hide Parameter UI” option if you want to link against the target file without having to manually submit parameter.

Killing XActions for Reporting – slowly and with pleasure

The new Pentaho-Metadata data-source scripting-extension I produced recently seems to be well-received. We now have a great opportunity to fully cut back on XActions for plain reporting uses.

Many users still have to stick with their old XAction driven reports. I assume that they do not do that because they enjoy the pain, or love programming in XML. No, the majority needs to drive parameter queries where the query itself is computed. Reasons for that are many:

  • If everything is placed in one query, the query gets insanely complex and unmaintainable.
  • The query access legacy systems with no sane data models or weird partitioned tables.
  • The data-source needs to be configured based on some other parameter before it is used.

 Now users who have been locked in by these cases can now free themselves from the slavery of weird XML-programming. I happily announce that

Pentaho Reporting now ships with sane scripting support for JDBC, Pentaho Metadata, all Mondrian and all OLAP4J data-sources.

The data-source scripting system of the old days is now a dark memory of the past. The new scripting is integrated into the data-source itself, so scripts become more reusable. Editing large scripts on the report’s “query::name” attribute was never really fun anyway.

The scripting extension allows you to configure the data-source before it is used for the first time. You can now configure all aspects of the data-source via your script, including but not limited to any supported connection-parameter like the JNDI name or additional JDBC-connection properties:

var jndiConnectionProvider = dataFactory.getConnectionProvider();
jndiConnectionProvider.setConnectionPath("java://myjndiconnection");

You cannot get away with not configuring the data-sources at all, as the report designer and all other tools love to have something static to work with. But you can always reconfigure them to your liking before the report is run. The scripts are always called before the a query is executed on the data-source, even inside the Pentaho Report Designer or the data-source editors.

I don’t believe that we will need similar scripting support for the Kettle, Java-Method-Invocation, Table or the Scriptable data-sources. These data-sources either do their own scripting anyway or would not profit from any scripting abilities.

Doing the performance dance (again)

I just changed another bit of the table-export while integrating a patch for PRD-3631. Although the patch itself did take a few illegal shortcuts, it showed me a easier way of calculating the cell-backgrounds for the HTML, Excel and RTF exports of Pentaho Reporting.

After a bit more digging, I also fixed some redundant calls in the HTML and Excel exports for merged cells and row-styles. Both resulted in repeated calls to the cell-background calculator and were responsible for slowing down the reporting engine more than necessary.

The performance of my test reports improved a bit with those changes. But if any, then this case has shown me that clean report design is the major driver of a fast export.

The performance for the reports went up by 15 to 30 percent, with the larger changes on the reports with larger row-counts. However, the reports I test are surely non-representive, as there all elements are perfectly aligned and the report is designed to avoid merged cells.

The patch specifically claims to address performance problems in the cell-style calculation. Agreed, there were problems, and the patch addressed them. But there was no way I could see a 100% improvement on normal reports. Well, not reports that are well-designed and use the powerful little helpers that the Pentaho Report Designer offers to make reports well-aligned.

When I receive production reports for debugging, the picture is usually more bleak. Fields are place rather randomly, and usually misaligned by a few points. They start and end on rather random positions, and usually elements are not aligned to element boundaries across different sections.

Let’s take this visually fairly report as an example:

The many fine grey lines you can see mark element boundaries. The more lines you see, the more cells your report will have. Each cell not only means larger HTML files, it also means more processing time spent on computing cell-styles and cell-contents. Thick grey lines spanning across the whole section usually indicate elements that are off by less than one pixel.

These lines are produced by the Report Designer’s “View->Element Alignment Hints” feature. When this menu item is selected, you will get a better idea on how your report will look when exported into a table-export. If you cannot see the details clearly, zoom in. The Report designer happily magnifies the working area for you.

When exported to HTML, this report here created a whopping 35 columns
with another 35 rows. That is potentially 1225 cells. The resulting
HTML file has a size of 21,438 bytes. For a report with just a few items of text, this is a lot.

In general you’ll want to avoid having to many of these boundaries. In the basic design courses, teachers tell fairly early on that layout where element edges are aligned look cleaner and more pleasing for the eye. When you look at adverts or magazines, you can see this on how articles and images seem to sit along visual boundaries or dividing lines. For a well-designed report this is no different.

To help you design your reports in a well-designed fashion, the report designer comes with the “View->Snap to Elements” feature.

To clean up a report, I usually start by aligning elements that are sitting close together. Visually, it makes no difference whether a element starts at x=24 or x=24.548. For the reporting engine, this makes a difference, as a dumb little engine cannot decide whether the user was just lazy or had a very good reason to have a cell at exactly these positions (or whether some visual design would break by attempting to blindly fix it). 

With the “Snap To Elements” enabled, just select one element and drag the mis-aligned edge until it snaps off its current position. Then move it back into position. This time it will snap to one of the other elements. If your edges are very close, I drag the current edge towards the top (for the y-axis) or the left (x-axis) until it leaves the crowded area. When I return with it, it will snap to the first (top-most or left-most) edge in the group of elements by default. Repeat that with all elements that sit near the edge and very soon you should only see one thin line indicating a perfect alignment.

Additionally, you can also change the element’s position and size to a integer number in the “style” table on the right-hand side of the Report Designer. When you do that for all elements, your alignment task will become a lot easier. Now elements are either aligned or at least one point apart (and the mis-alignment is easier to spot).

The quickly cleaned up version of my sample report now has only 24 columns and 16 rows, but visually, you cannot tell the difference between the two of them. Theoretically, the resulting table can have 384 cells, compared to the mis-aligned report a reduction to a quarter of the original 1225 cells. And finally, the generated HTML file shrunk to a size of mere 8,853 bytes, one third of the original size. In my experience and with those numbers in mind the computing time for this optimized report should be roughly 10% to 15% better than the optimized version. In addition to that slight boost, your report will download faster and rendering the report in the browser will be a lot quicker as well.  

So remember, performance optimization starts in your report designer: When you optimize your report designs it instantly pays off with quicker rendering and smaller downloads.

Further optimization

That report uses line-elements to draw borders around the statistic sections. By using sub-bands with border definitions, that report could be simplified further, but I’ll leave that as an exercise for the class.

Pentaho Reporting’s Metadata DataSources now with more scripting power

Pentaho Reporting is rather flexible. No, astonishingly flexible. Calculations for styles, attributes and even queries make sure that there is always a way to tweak a report exactly the way you need it. But until now, there were a few things that were not exactly easy. Custom Data-sources and their method of calculated queries were one of them.

With today’s code drop, this story has fundamentally changed. The Pentaho Metadata data-source is the first data-source that combines static queries with optional scripting features. There is no need to have a “Custom-Metadata” datasource anymore. No need to cram your query calculation logic into a tiny field on the report’s query-attribute.

Here it is: The new and greatly improved Pentaho Metadata Data-Source:

The scripting extensions hide beneath the two new tabs on the top and on the query-text area. Those who don’t need scripting or who are scared of having to program can safely continue with their life – just ignore the scripting-tabs and you will be safe in your static world. But if you feel you are born to greatness and therefore you cannot be limited to predefined queries, then follow me on a exciting journey into the land of power-scripting.

The scripting extensions are split into a global script and a per-query script. The global script can be used to define shared functions or global variables that can be seen by all query-scripts. The init()function can also be used to change the configuration of the data-source itself. The per-query-scripts allow you to (a) customize the query string, (b) calculate the “additional fields”  information for the query-caching and (c) allows you to post-process the returned table-model.

The original “Custom-*-Datasources” always had the problem that you can only add one of these datasources per report or subreport. Now that we combine scripts with named queries, we can have a independent script for each of the queries and of course as many datasources as necessary.

The scripting backend uses the JSR-223 (javax.script) scripting system. By default we ship with both JavaScript and Groovy support. There are quite a few JSR-223 enabled languages out there, so if you have a favourite language, just drop it into the classpath on both the Pentaho Platform server and the Pentaho Report Designer and you are ready to code.

The picture above shows you the global script pane. For the two languages (JavaScript and Groovy) we ship with, we have a prepared template script that contains some documentation as well as the empty function declarations of the functions we expect to be able to call.

It is absolutely OK to delete any function you dont need – in that case Pentaho Reporting simply ignores that part. This will help you to keep your scripts small and simple. You can load scripts from external sources as well, but be aware that it is your responsibility to make these external scripts available to the report at runtime. So if you load a script from a local directory, you have to ensure that the same works on the BI-Server (or your production environment) too.

For the upcoming Pentaho Reporting 3.9.0 release, the Pentaho Metadata datasource is the only datasource with this sort of scripting. Once that design has proven itself and I have more feedback on how well it works, we will add this scripting system to all major data-sources in Pentaho Reporting 4.0.

PRD-3553 – or the low performance of large reports

The funny thing about most bugs is, that for most parts they go completely unnoticed. No one ever hits them or if they do they think its just normal weird behaviour. (Thanks, Crystal Reports & Co for training your former (and our current) users!)

One of these bugs I had in my bucket for the last week was a rather nasty error condition on reports with a large number of rows. Reports like that are usually used either for data exports or for performance tests, as a human user barely reads the first sentences of an e-mail, not to speak of hundreds of pages of dull numbers.

What was the bug?

Reports with large numbers of rows were incredible slow when run as Excel/HTML/RTF/Table-CSV or Table-XML export. The slow down was worse the more rows were in the report.

The original e-mail contained a test report that demonstrated the issue. (Note: Providing me with a replication path tremendously increases your chances of getting a bug-fix fast. You help me, and I’ll help you.) The tests they ran on their machine showed a clearly exponential curve:

# of Rows Time in seconds
5000 16
10000 28
20000 90
50000 360

What caused the bug?


Short version: The iterative processing in the reporting engine was broken and resulted in a increasingly large layout tree. Iterating this tree to calculate the layout gets more and more expensive the larger the tree gets.

Long version:
The Pentaho reporting engine uses an iterative layouting engine to process the report. During the layouting stage we build up a layouting-DOM tree containing the various boxes and content that makes up a report. Our model is heavily inspired by the CSS specification and is optimized towards using as little memory as possible. Boxes get added to the layouter by the reporting (data processing) engine and once the layout reaches a stable intermediate state, the content is printed and the processed boxes get removed from the layout-DOM. This results in a sliding zone of activity on the layout model and allows us to process huge reports with minimal memory footprint. And we don’t even have to swap to disk for this – all processing happens in memory.

To make this model work, we have to track which layout nodes that have been processed already and which nodes could still move around in the final document.

The whole process of tagging finished nodes worked nicely for paginated reports, but failed for flow- and stream-layouted reports. The system never tagged nodes as ready for print, and so the layout tree grew bigger and bigger. At the end, this resulted in larger processing times (as the iteration of the tree took longer) and a huge memory foot-print (as more nodes have to be held in memory).

To fix this problem, I had to fix the “ready-for-process” tagging [SVN].

Once the iterative process sprung to life, the memory foot-print went down, but the processing performance was not as good as I would have expected. In fact, the iterative processing worked so well, that it caused more overhead than it actually saved time. A quick and dirty throttling of the layouter’s processing queue made performance jump up. We now only process every 50th iterative event, and thus trading a bit more memory costs for a huge increase in processing speed.

How do we perform now

The bug fix was a absolute success beyond my wildest dreams. I can say I have witnessed an increase of 800% in report processing speed. (Ok, it is not difficult for this class of bugs: all you need is patience and a large enough report.)

# of Rows Time in seconds
Previous to the fix After the fix Change in %
5000 11 11 0%
10000 24 22 9%
20000 414 47 880%
50000 (crash) 146 (a rather large number, trust me)

When can I get this fix?

The fix will be included in the 3.8.2-GA release and should be built within the next weeks. At the moment, everyone at Pentaho is still busy finalizing the BI-Server 4.0/PRD-3.8.1 release, so it will take a moment before we can fire up another release.

In the meantime, you can either grab the sources from the Subversion repository or grab the CI-build. The 3.8.2 engine is a direct replacement of the 3.8.1 build, so you can patch your system by just copying the final jar-files over. I have not tested whether you can do some similar patching with 3.8 or 3.7 installations.

How to create a fancy tooltip in HTML reports with PRD

When you create reports that are mainly used on the web, you probably want to enrich your reports with some basic interactivity. Charts need links and tooltips on their data, drill downs need to be defined and hopefully the information overload of ordinary reports gets reduced via fancy images, hidden sections that only show up on demand and other techniques.

The most basic way of creating a annotation on a report is to provide tooltips. Sadly the HTML creators were weird scientists who were used to long and boring lists of footnotes instead of in-lined annotations.

Today I am showing you how to create fancy, JavaScript based tooltips onto a report. You can adapt the same technique to create other interactive elements, including Google Maps integrated reports or other Web-2.0 mash-ups.

Rich-Text tooltips: The idea

The tool-tip system I am integrating is based on the blog posting written by Michael Leigeber, a web designer and .NET developer who runs the Leigeber Web Development Blog.

The tool-tips behaviour is defined in a central script that gets included in the report’s generated HTML output. To include the script, we simply copy the whole raw HTML/javascript into the report attribute “html::append-body”. When executing the report, the contents of this attribute are copied into the HTML file before the first table is rendered.

If you want to know how the tooltip script works, please refer to Michael Leigeber’s blog posting.

On each element that we define, we can now define the tooltip as HTML text on the “onmouseover” attribute.

tooltip.show('Testing  123 ', 200);

The tooltip gets hidden when the mouse leaves the element via the “onmouseout” attribute.

tooltip.hide()

The first parameter in the “show” method call is used as ‘innerHTML’ on the generated tooltip element. So you instead of just plain text, you can include any HTML content you like, including images, tables or animations.

Dynamic Tooltips: Show me my data in the tooltip

We all agree that static tooltips are rather boring. A tooltip, well-placed can mean all the difference between a information-overloaded report and a report that shows the exactly the right information you need at exactly the right time. Show the main sales numbers, and move the detail content into the tooltip.

In Pentaho Reporting, most attributes and styles can either contain a static value or can be computed at runtime via an expression or formula. When you see a green plus on the last column of either the style or attribute table, then you will be able to add a calculation for that property.

To make the tooltip show your own data, you will need to make the first parameter of the tooltip.show(..) function call dynamic. The “onmouseover” property expects a string that is valid JavaScript. The reporting engine does not interpret this string at all, it just passes it into the resulting HTML document and lets the browser decide what to do with it.

So all we need to do, is to compute a JavaScript text that contains our data:

="tooltip.show('" & [myfield] & "');"

But careful. If myfield contains any character with special meaning in JavaScript, like quotes, your script is not going to work. To make it work we will need some proper quoting to turn the text into proper JavaScript code.

The Pentaho Reporting Engine offers the “QUOTETEXT” function for this purpose. This function takes two parameter. The first parameter is the text you want to quote, and the second parameter is a constant telling how you want to quote the text: “javascript”, “xml”, “html”, “formula-string” or “formula-reference”.

For now we will need two sets of qouting, “javascript” and “html” as our quoting, and alter the formula above to read:

="tooltip.show('" & QUOTETEXT(QUOTETEXT([myfield];"html");"javascript") & "');"

The inner QUOTETEXT ensures that the text given in [myfield] is proper HTML and that all special HTML characters are encoded properly. So ‘>’ gets converted into >, ‘<' into < and so on. The outer QUOTEXT function then ensures that the resulting text is also proper JavaScript code. It encodes all single and double quote characters and all newlines, tabs and so on into their properly encoded JavaScript counter parts. This is already a working tooltip, even though it is a bit primitive. Earlier on I said, you can use HTML text as tooltip. So lets do a bit of HTML magic here. Again, text in HTML needs to be encoded properly as well.

="tooltip.show('" & QUOTETEXT("

" & QUOTETEXT([PRODUCTNAME];"html") & "


" & QUOTETEXT([PRODUCTDESCRIPTION];"html") & "

";"javascript") & "');"

This produces a tooltip that prints the product code from the steel-wheels example and the product description divided by a horizontal line.

You can grab a sample report for Pentaho Reporting 3.8.0-GA:
fancy-html-tooltip.prpt

Don’t hardcode host names, use JNDI

I got a e-mail earlier this week, asking for some help in migrating reports.

Hey Thomas,

I have a poop load of reports with hard coded 
ip addresses (yeah I know – but they didn't 
have DNS entries until now) that I'd like to 
globally change.  I'm thinking of something 
along the lines of "zcat | sed | gzip" but 
something like this would only work on 
compressed files, not an archive of files.  

So – I was wondering if you had any ideas, 
maybe even a utility based on code we already 
have to create and open these bad boys.  
We could even have some known locations to 
modify 

Like>  prpt-replace –dbConnectionHost "192.168.1.100" "mydb.mycorporation.com"

Or maybe use file and xpath to denote the 
attribute to change.

(The names involved were changed to protect the guilty.)

Well, hard-coding database connection information in every report is never a good idea in no environment. If your database host or credentials changes you are truly and well .. in trouble.

So how can you avoid this trouble in the first place?

Use JDNI to hold database connections in a external place

JDNI is a technique to hold database connections outside of the report definition. By holding this information in a central spot, you only have to change one place and all your reports get updated. The JNDI stored connection information is stored on the computer where your report runs. This also allows you to have separate definitions based on where the report runs. Your report designer can access a local copy of your production database, while your production server uses the real database server.

As a side note: On the BI-Server, this system of abstracting database information is called “Pentaho Datasources”, not JNDI. In this posting, I will forgo the branding exercise and simply continue to call it JNDI.

You can use JNDI in the following datasources:

  • SQL
  • Pentaho Analytics (Mondrian)
  • OLAP4J
  • Pentaho Metadata

For all of your production reports, you should always choose JNDI defined datasources. Manually defined datasources are great for testing out ideas, but really a nightmare to maintain and by storing the database credentials they potentially pose a security risk if you choose to share report definitions with others.

JDBC Connection definition basics

A database connection in Java requires four mandatory pieces of information:

  • driver: The name of the driver JDBC class.
  • url: The JDBC URL. This URL contains all information needed to connect to the database like the host name and port. The format of this URL is specific to a particular database. Consult your database manual for details on your particular database.
  • user: The username for the database.
  • password: The password for the accessing the database.

In addition to this, each driver allows you to specify additional properties. Again, these properties are specific to the database and (hopefully) documented in the manual. Most databases alternatively allow you to encode the properties onto the URL.

Defining data-sources in the Pentaho Report Designer

Within the report designer, there is no user interface to define the JNDI connections. The connections are defined in a file called "$HOME/.pentaho/simple-jndi/default.properties".

The file looks somewhat like this:

# Copyright 2008 Pentaho Corporation.  All rights reserved. 
# This software was developed by Pentaho Corporation and is provided under the terms 
# of the Mozilla Public License, Version 1.1, or any later version. You may not use 
# this file except in compliance with the license. If you need a copy of the license, 
# please go to http://www.mozilla.org/MPL/MPL-1.1.txt. The Original Code is the Pentaho 
# BI Platform.  The Initial Developer is Pentaho Corporation.
#
# Software distributed under the Mozilla Public License is distributed on an "AS IS" 
# basis, WITHOUT WARRANTY OF ANY KIND, either express or  implied. Please refer to 
# the license for the specific language governing your rights and limitations.
SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/user=pentaho_user
SampleData/password=password
SampleData/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata

SampleDataAdmin/type=javax.sql.DataSource
SampleDataAdmin/driver=org.hsqldb.jdbcDriver
SampleDataAdmin/user=pentaho_admin
SampleDataAdmin/password=password
SampleDataAdmin/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleDataAdmin/url=jdbc:hsqldb:hsql://localhost/sampledata

The JDNI definition file holds all connections known to the local JNDI system. This system has no connection to the BI-Server and does not access any connections defined in the Pentaho Administration Console.

A JNDI database definition in the report designer is defined by a set of properties. Each property name has two parts: The name of the data-source (‘SampleData’ and ‘SampleDataAdmin’ in the example above) and the JDNI property name itself. Each datasource declared in the “simple-jndi” definition file requires the following properties:

  • Mandatory “type” entry set to the value “javax.sql.DataSource”
  • Driver class name
  • Connection URL
  • A username and password
  • .. and any additional property you want to pass in

The simplest way of creating a new definition entry is to copy an existing connection entry and to alter the name of the entry (the first part of the properties) and then to replace the connection information with your own details.

All JNDI connection names are case sensitive, so inside the report you have to use the same spelling for the name as in this properties-file.

And last but not least: Defining your connection information in the JNDI system does not relieve you from providing suitable database drivers. You can install new drivers by copying the jar files (and if needed any native libraries) into the “lib/jndi” directory of the Pentaho Report Designer.

Defining a JDNI connection in the Pentaho Administration Console

For the Pentaho BI-Server, the Pentaho architects chose not to use the JDNI subsystem that ships with all J2EE application servers. Sadly on the application servers JNDI has a few implementation flaws:

  • Redefining a JNDI connection requires you to redeploy or restart the application. This is a nightmare for a mission critical server.
  • Each server system comes with its own way to define JNDI definitions. Each of these ways is inherently incompatible with any other vendor.
  • You are bound to the implementations provided by the server vendor. There is little hope to customize the database access for additional capabilities like security checks or multi-tenancy.

Before you can setup a connection, you have to install the correct JDBC driver into both the BI-Server and the Pentaho Administration Console. For that copy the jar files into the following directories:

  • Pentaho BI Server: pentaho/tomcat/webapps/pentaho/WEB-INF/lib
  • Pentaho Administration Console: administration-console/jdbc

To define the connection, start the Pentaho Administration Console. The console starts up on port 8099, and can be reached via the URL “http://localhost:8099/”. The user and password are “admin” and “password” (sadly not documented anywhere in the download). The console can only be run successfully if the BI-Server is up and running as well.

To define a datasource, activate the “Administration” mode on the left hand side of the window and switch to the “DataSources” tab. Hit the small white “+” to add a connection.

The Pentaho Administration Console is smart enough to detect all installed drivers. If your driver is not listed here, you probably have forgotten to add it to the “lib” directories outlined above.

The data-source names you define here must match the datasource names defined for the Pentaho Report Designer. I also heavily recommend that you use the same type of database system for designing reports and for running them in production. SQL is a widely interpretable standard and therefore SQL written for one database is not necessarily compatible with the SQL dialect understood by the others.

Let me close with: Death to all hard-coded settings, long lives the host provided setting!