Category Archives: Basic Topic

Pentaho Report Designer – Essential Readings to get started

When you download the Pentaho software for the very first time, the bunch of cryptic applications can be daunting. Reading about all the new and complex concepts of OLAP-cubes, data-warehouses, ETL processes to fill the data-warehouses or metadata layers is daunting at best.

If all you want to do is start with a report to see your own data printed, then at first, you can ignore the cryptic stuff and stick to the bread-and-butter parts: Using the Pentaho Report Designer to create your first report. Lets find out how.

First, you need to make one critical decision: Is what you want a classic report that is primarily meant to be printed (or used as if being printed)? Or do you want to explore your data to discover new knowledge.

Reading: Learn about the two kinds of reports.

If you are seeking an analytic tool for exploring your data, then you have no other option than to setup a OLAP-Cube to use the Pentaho Analyzer. These days, for simple cases you can get quick results with Pentaho’s Agile BI Initiative, so check them out.

From now on, I assume you are more interested in conventional reports with the sturdiness and durability of real paper. For over 500 years, since the invention of the double-entry bookkeeping, business reporting played a crucial rule in the business world. Reports are made to last.

To get started with the Pentaho Report Designer, you do not need any special setup or a data-warehouse as Pentaho Reporting can work with most relational databases after installing the right JDBC driver. You should get these drivers from your database vendor.

The Pentaho Report Designer already comes with drivers for MySQL, Hypersonic and H2 database systems. You can install your own drivers by copying the driver jar into the “lib/jdbc” directory inside the report designer.

Reading: Follow the walk-through to learn how to create your first report

This tutorial uses the sample database that we provide inside the report designer. This step-by-step guide will show you how to setup a database connection, how to get data into your report and how to layout elements to make a visually appealing result. When you use the SQL datasources you need a basic understanding of SQL. If you are not familiar with SQL, you can learn the basics of SQL in this rather good tutorial. Your database manual should also contain a introduction to SQL.

Work through the tutorial and you are ready to make your first steps in the world of Pentaho’s Business Intelligence tools.

There are some articles in this blog that help you further:

MySQL with Pentaho Report Designer for those who dont read manuals

Several people in the forums seem to have severe problems setting up MySQL or understanding how MySQL works. They also seem to have severe problems reading the database manuals pointed out in the forum, insisting the problem lays within the Report Designer. So I am going to write up some simple explanations on how to set up MySQL, how to configure the database users for access over the network and how to work around all the little and large weird quirks of that database. And then hopefully I just can point these users to here and never have to deal with it again.*


I do assume that you have MySQL installed and up and running. If not, then go and get PostgreSQL, a database I personally trust.

Connecting to a MySQL database – the basics

All the Pentaho tools and thus the Pentaho Report Designer are Java Applications and therefore use JDBC to access the databases. For other databases, the general rule is: Use the driver that ships with your database and keep the version numbers in sync.

To work with a MySQL database, you will need a suitable MySQL driver. In my experience MySQL drivers have not been the most stable nor most reliable drivers around. At Pentaho, we recommend to use the MySQL JDBC Driver version 5.1.10 as default driver, as this seems to be the most stable driver. Other versions may produce invalid results, and if you want to use any other version, make sure you test it carefully.

If you use the Pentaho Report Designer to create a connection, you will be asked for several properties. To connect to the database directly, select the connection mode “native”. Then enter your hostname or IP-address, port (default port number for MySQL is 3306) and username and password.

For generic JDBC database connections, you need the JDBC-URL, your username and password and optionally the connection properties:

Setting Value

Driver

The driver class to load. The name of the class that implements java.sql.Driver in MySQL Connector/J is com.mysql.jdbc.Driver.
JDBC-URL

The JDBC URL for the MySQL JDBC Driver uses the following format. Items in square brackets ([, ]) are optional.

jdbc:mysql://[host:port]/[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue2]...

If the host name is not specified, it defaults to 127.0.0.1. If the port is not specified, it defaults to 3306, the default port number for MySQL servers.


Username

Your user name as given to you by your database administrator. See below how MySQL translates that information and matches this against the internal user database.

Password

Your password for the connection.

To get the same connection as shown in the dialog screenshot above, you would have to specify the JDBC-URL:

jdbc:mysql://localhost:3306/sampledata

with username “pentaho_user” and “password” as password.

In most cases your database will be running on a different host than the computer you are editing reports or where the BI-Server is installed. When the database is running on a different computer, you will be able to distribute the work-load of answering requests and generating reports a lot better. If both the reporting server and the database share resources like the CPU, memory or disks, there is always the danger that both systems compete for resources and slow each other down unnecessarily.

User management and access control in MySQL

When you connect to a MySQL server, your identity is determined by the host from which you connect and the user name you specify. A MySQL server side user name is the combination of the user name you specified on the connection (“pentaho_user” in the example above) and the host from where you try to connect to the database server.

MySQL users are defined by their username and hostname and generally written in the format ‘username’@’hostname’.

In the example above we connect to the special host address ‘localhost’ (or by its IP address: 127.0.0.1). Only applications running on the same computer can make connections to this interface. By default a server will listening to these connections will see them as originating from “localhost” itself.

So in the above example, the full identity would be ‘pentaho_user’@’localhost’.

MySQL uses a two-stage authentication system. In the first stage it checks, whether you are allowed to connect to the database and validates your username and password. In the second stage MySQL checks whether you have the access rights to see the contents of the tables stored in the database or whether you are allowed to update or delete data.

When we try to establish a connection to the MySQL server, the server will check its local user database for an entry matching the username and password. If there is no user defined for the given hostname, you will see the error message

Host [...] is not allowed to connect to this MySQL server

The error message

Access denied for user 'pentaho_user'@'localhost' (using password: YES)

indicates that either the password is invalid or that this user has not been defined for connections from this host.

If you see the error message

ERROR 1044 (42000): Access denied for user 'pentaho_user'@'localhost' to database 'sampledata'

then you need to grant additional permissions to the user to allow access to the database.

Listing the defined users

MySQL stores all administrative information in a special system database called ‘mysql’. You can connect to this database like any other defined database. To see which users are defined, invoke the following SQL statement via the MySQL console:

SELECT User, Host FROM mysql.user;

MySQL will check your identity (username and host) against the user table. Both the user and the host must exist. If the table contains a host name called “%” this particular entry matches all hosts. If the table contains a empty user name, it matches any user name.

To allow the user ‘pentaho_user’ to connect from any localhost, you would need to have a matching entry for it in the table.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

To allow connections for ‘pentaho_user’ from any host, a entry with ‘%’ as hostname must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

Big fat warning: Even though both entries have the same user name (pentaho_user), they define two entirely different and unrelated identities for the database. It is totally possible and legal for ‘pentaho_user’ from localhost to have access to different databases than the same user for the other hosts. For MySQL these are two different entities.

Connections from Pentaho Report Designer via Connections from PhpMyAdmin or other Server Side Software

The MySQL identity system of treating users from different hosts as completely different entities can cause a lot of headaches, especially if you are not familiar with the underlying workings of the internet.

If your MySQL server is set up to only allow connections from specific hosts, you can run into problems where you can access the database from your web-server but not from within the report designer or vice versa.

Lets look at the most common usage scenarios:

Pentaho Reporting, the Web-Server and MySQL run on the same machine

This is a typical development scenario, where the developer installed all software on his own box for convenience.

+---------------+
| PRD-Host      |
+---------------+
| Web-Server    |
+---------------+
| My-SQL Server |
+---------------+

MySQL must be configured to allow access from ‘localhost’ and all connections in the report must use ‘localhost’ as the hostname for the MySQL connections.

A identity for ‘pentaho_user’@’localhost’ must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

Pentaho Reporting, the Web-Server and MySQL all use different machines

This is the typical production scenario. Both the database server and the web-server run on different machines for maximum performance. Both machines are usually installed in a server room or data centre.

+----------+          +------------+         +---------------+
| PRD-Host |  --+-->  | Web-Server |  ---->  | My-SQL Server |
+----------+    |     +------------+         +---------------+
                |                                 /|\ 
                +----------------------------------+

Both the developer sitting on the Pentaho Report Designer and the Web-Server need access to the database. While the developer creates the reports he needs access to the database (or a test database with a similar structure than the production database). When the report runs, the web-server must connect to the database as well to execute the reports.

Both the developer PC and the web-server must have entries for the user ‘pentaho_user’ in the user table.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of webserver | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

Alternatively, you could allow access from all hosts. Note that if your database server is not shielded by a firewall, then you may easily open your database to the world. Any self-respecting database administrator may try to strangulate you for suggesting such an configuration.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

The Web-Server and MySQL share the same machine

You find this set up occasionally in smaller hosting companies or from people who own just a single server. The database and web-server share the same machine, which is hosted in the data centre or server room. The developer runs the report designer on his own desktop machine.

+----------+          +---------------+ 
| PRD-Host |  --+-->  | Web-Server    |
+----------+    |     +---------------+
                +-->  | My-SQL server |
                      +---------------+

The web-server can access the MySQL server via ‘localhost’, while the developer needs to access the MySQL server from his own host outside of the server room.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | localhost                   | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

So if you ever wondered why you can access your MySQL server from PhpMyAdmin or similar tools, but not from the Pentaho Report Designer, you should now be able to see the answer and to fix the problem.

You can find more information on this topic in the chapter about MySQL user management in the MySQL manual.

Fine grained access permissions

Even if you are able to connect to the database, you may not be able to query the database. The second stage of MySQL’s security system checks your identity against a set of defined permissions. These permissions control whether you are able to query or modify the data in the database or whether you are allowed to change the table structures or perform other administrative tasks.

Pentaho Reporting only queries the database. The Pentaho reporting tool does not modify data and does not perform any administrative tasks. All MySQL users need to have the “SELECT” permission on all objects you want to access from the reports.

To grant a specific user the permission to select data, use the following statement from the MySQL console.

GRANT SELECT ON database.* TO 'pentaho_user'@'localhost';

MySQL connection properties you should know

To make the JDBC driver work properly in MySQL, you will have to specify some additional connection properties. The driver’s default settings can cause severe problems for the reporting system. These defaults have been defined for historic releases of MySQL and are harmful these days.

Property Definition Comment
useCompression Use zlib compression when communicating with the server (true/false)? Defaults to ‘false’. Useful if you access a remote server through a low bandwidth connection. Disable for production systems.
passwordCharacterEncoding What character encoding is used for passwords? Leaving this set to the default value (null), uses the platform character set, which works for ISO8859_1 (i.e. “latin1”) passwords. For passwords in other character encodings, the encoding will have to be specified with this property, as it’s not possible for the driver to auto-detect this. If you are not using an all ASCII password and work outside the area of Western European languages, knowing about this is a life-saver.
characterEncoding If ‘useUnicode’ is set to true (which is the default), what Java character encoding should the driver use when dealing with strings? (defaults is to ‘autodetect’). If the encoding cannot be determined, then an exception will be raised. Sometimes MySQL server do detect the character set correctly and therefore return garbage when querying tables with non-ascii data. In this case this property can be used to override the autodetection of character sets.
functionsNeverReturnBlobs Should the driver always treat data from functions returning BLOBs as Strings – specifically to work around dubious metadata returned by the server for GROUP BY clauses? The MySQL JDBC-Driver returns string results of functions in SQL statements as byte-arrays. If you use any functions in SELECT statements, set this property to “true” or you will see garbage in the results.
emptyStringsConvertToZero Should the driver allow conversions from empty string fields to numeric values of ‘0’? MySQL treats empty strings or values as zero for numeric columns by default. If you need to see the values as they are stored in the database without helpful corrections from the JDBC driver, then set this flag to false.
zeroDateTimeBehavior What should happen when the driver encounters DATETIME values that are composed entirely of zeros (used by MySQL to represent invalid dates)? Valid values are “exception”, “round” and “convertToNull”. Use this if your select statement fails with an exception and you have a zero-date in it. Set it to ‘convertToNull’ to come to a more reasonable result.
serverTimezone Override detection/mapping of timezone. Used when timezone from server doesn’t map to Java timezone Use this if your dates returned by the database seem to be offset by some hours. Also see ‘useGmtMillisForDatetimes’, ‘useJDBCCompliantTimezoneShift’, ‘useLegacyDatetimeCode’ and ‘useTimezone’ for more time zone related functions.
holdResultsOpenOverStatementClose Should the driver leave the result sets open on Statement.close() (enabling violates JDBC specification) This can be a very helpful property for reports with subreports. It allows the Pentaho Reporting engine to process several server-side resultsets in parallel. Without this setting, the reporting engine will fully buffer result-sets in memory instead, which increases the memory consumption of the reporting system.

Further readings

If you have made it this far, then let me tell you one big hint: MySQL comes with a nice administration tool called My SQL Workbench which allows you to administrate the server without having to go through the command line.

* Yeah. At least I can fake hope, even when I don’t feel it.

An easy way of Printing Aggregations

Setting up a large number of aggregated values like sums, counts or averages can be a dreadful experience in the Pentaho Report Designer.

Calculations are performed by functions and expressions (let me call it just functions). These functions are added in the data tab of the report designer. Each function gets a name under which its result can be referenced. And then it needs to be configured properly. Usually, you have to set at least the field it should work on and the group on which it should reset.

Doing it for one field is not exciting, but ok. Doing it for 10 fields is no longer fun.

If you just want to print the result of that calculation, without ever using it as part of another computation, then here’s an easier way of aggregating data:

(1) Drag a number field onto your report and assign a field name to it (attribute common::fieldname).
(2) Set the aggregation you want to use via the attribute “wizard::aggregation-type”.
(3) Optional: Define the group on which you want to reset. If not defined, the current group is used.

This way of defining aggregations is used by the Report Design Wizard. To see a working example – just generate a report with it and add a summary on the fields.

Element layouting strategies in Pentaho Reporting

Back in February on the London Pentaho User-Group meeting I promised to make the contents of that presentation available as blog entries. This is the first of these three articles.

The Pentaho Reporting Engine supports several lay-outing strategies to create sophisticated reports easier and to simply the report creation process.

You can change the layout strategy of a band using the “layout” style-key on that band.

Canvas-Layout

A canvas layout positions elements freely on the area of the parent band. Elements have no relationship to each other during the lay-outing. Therefore if an element expands its size, it does not push elements out of the way. Expanding elements always increase the size of the parent band.

Elements inside a Canvas level band are positioned using the position::x and position::y style-keys.

The Canvas layout is the default layout for all new reports and bands.

If you are familiar with HTML and CSS, then think of this layout as a collection of absolutely positioned elements.

Block-Layout

Elements in a block layout band are laid out one after each other vertically. Block-level elements span the full width of the parent band. If a element expands it pushes all other elements down so that no element overlaps the other elements.

Master- and SubReport elements as well as Groups are always laid as block elements.

If you are familiar with HTML and CSS, think of this layout as a set of

or

elements.

Inline-Layout

In an inline formatting context, boxes are laid out horizontally, one after the other, beginning at the top of a containing block. Horizontal margins, borders, and padding are respected between these boxes. The boxes may be aligned vertically in different ways: their bottoms or tops may be aligned, or the baselines of text within them may be aligned. The rectangular area that contains the boxes that form a line is called a line box.

A inline element that is placed in a non-inline layout band creates an artificial paragraph to wrap around this element during the lay-outing process. The most common use of this layout strategy is to produce rich-text content from several independent report elements.

If you are familiar with HTML and CSS, think of this layout as a set of elements.

Row-Layout

The row layout positions elements one after each other on the horizontal axis. All elements are printed in a single row, expanding their height as needed. If all elements should expand to the height of tallest element, set the min-height to “100%”.

Row-layout is a natural match for list reports, where multiple columns of data should be printed per row of data. When a element expands its width all other elements get pushed to the right.

If you are familiar with HTML and CSS, think of this layout as a HTML table with a single row of data.

When you use a row-layout for your list-reports, you will no longer need to layout elements manually. To create spacing between elements use either padding on your elements or place a empty band as padding element into the row layout band. The report design wizard makes use of the row-layout to position elements in the details band and details-header bands.

Combining layout strategies for better effects

You can combine several layout strategies in one root element by adding extra bands to your report. All elements placed into these bands will be governed by the band’s layout setting.

How to avoid that dynamic-height elements overlap other elements

You can use this to avoid overlapping elements in your report whenever you use “dynamic-height” elements and to create proper table-rows so that elements of the second row get pushed down by the expanding content of the first row.

Use the following steps to create a two-row details band.

  1. make your details band a block-layout by setting the layout-style of this band to “block”.
  2. Add two bands.
  3. Add the elements for the first row into the first band, and all elements for the second row into the second band.

That’s it. When your first row elements expand, your second row elements will be pushed down.

Download a sample report to see the row- and block-layouter in action.

Staying in touch with the Pentaho Reporting project

Whether you are a Pentaho Customer or a Pentaho Community Member, it is always nice to stay up to date with our progress. This especially matters if your favourite new feature or least-favourite bug get crunched right now.

So here’s how to do it:

(1) If you’re just interested in a single issue: Subscribe to the issue. There is no case yet? Create one, JIRA is open.

(2) If you want all issues reported in JIRA subscribe to a RSS feed.
Pentaho Reporting Issues RSS Feed

(3) Alternatively, subscribe to the Pentaho Reporting Developers List, and you get all JIRA updates into your Inbox.

(4) You want to know it all, every single commit? Subscribe to the Pentaho Reporting Commits Mailing List.

Personally, I use both the RSS feed and the mailing lists. The commits-list is most useful to see the source-code level changes coming in for reviewing those changes.

What is a report?

In its most basic form, a report is a document that contains information for the reader. When speaking of computer generated reports, these documents refine data from various sources into a human readable form. Report documents make it easy to distribute specific fact-based information throughout the company.

All reporting tools require well-defined data-sources to work with, so that the tools can formulate clear rules for querying, aggregation and data display. Computer generated Reports of all kinds consist mainly of tables and charts to convey their information to the recipients.

The term “report” or “reporting” is fairly imprecise these days and can mean several things. We differentiate two types of reports: “Analytical reports” and “Operational Reports”.

Analytical Reports

Analytical reports usually used for strategical (long term) decision making. They are marked by a high variation of questions asked, where the exact same queries may be asked relatively rarely. To answer the question summary information is more important than a detailed view on the transactions that caused them. In many cases, a time based comparison or aggregation is involved as well. And last but not least, these tools are used for explorative querying where one questions leads to even more questions.

To be useful in a explorative analysis, these tools need to be highly interactive. The resulting reports are usually not stored for long term use – and therefore Analytical Reporting Tools provide a limited set of tools to design the visual appearance of the resulting report. It is more important to get the answers fast and then to allow the user to move to the next question than to spend time on making the results look beautiful.

A chain of questions answered by Analytical Reporting Tools could look like this:

“What are the five top growing products over the last year?
How was the growth distributed over our sales regions?
Why is the southern region not performing as well as the other regions?”

or

“Show me the weekly shipping costs for our carriers?
Why is that one so high?
What product caused most of the shipping costs on this particular carrier?
How does the same product perform with the other carriers?”

Once the question is answered, it is highly unlikely that the same chain of questions is frequently asked again by other users.

Operational Reports

In contrast, a operational report answers tactical questions regarding the day to day operation of the company. The reports generated are business transaction orientated and support day-to-day activities of the company. The decisions based on these reports are immediate decisions made at the line-manager level. They contain a high level of details required to answer the question or requirements.

Operational reports usually come in form of lists or tables.

Typical questions answered by Operational Reports are:

“Give me the new orders for today”
“Show me the status of our stores and how many articles are left in store.”
“Print me a invoice for all customers served today”
“Print the daily accounts audit logs for our archives”

In Operational reports details are usually more important than the summary information. These reports answer questions that are asked frequently, so the same report can run a large number of times. For this reason, operational reports are “pre-canned” reports usually created by speciailzed report designers (power users or specialists). The design process for a good Operational report can easily take several hours to tweak the design of the resulting document to perfection.

The generated documents frequently get distributed to a larger audience in a process called “bursting”. Bursting is a old term used to describe the process of cutting down the large printouts on endless paper to send off the sheets of paper to their respective recipients.

Operational reports regularly get sent out to customers and business associates. As with all printed material, companies put a high emphasis on the graphical appearance of the resulting document to convery the right public image. They can run off the operational systems and operational reporting engines were traditionally part of ERP systems and other business applications that require printing capabilities.

Pentaho Reporting was originally developed as a operational reporting tool. Its detailed layouting and formatting capabilities stem from this requirement. As part of the reporting process, Pentaho Reporting can access and incorporate data from analytical datasources. The main focus for the Pentaho Report Designer lies in the document generation, and thus Pentaho Reporting is not a suitable replacement for analytical query tools.

The flexibility of Pentaho Reporting allows you to create Guided AdHoc Reports to widen the scope of your pre-designed reports.

ENV-Fields: Why, When and How to use them

The reporting-environment came to life as a simple way to inject runtime information from the platform into a report. Ideally, a report should not know about details from the server it runs on. Certain parts of the server’s configuration, like the server’s base-url, easily change when the report travels from development to the test and then to the production environment.

Earlier versions relied on magical string replacements during the parse process to inject that kind of information. Parse-time modifications are generally a bad thing. They are insecure,
as every String manipulation of XML files is. If you manage to get a valid XML-Fragment into your replacement string, you can do everything. There is no way a GUI tool could be written for it. The injection magically happens everywhere without giving the user a say on that matter. At design time, or outside of the legacy code of the Pentaho Platform’s JFreeReportComponent, this replacement does not happen. I don’t want to explain that mad system to anyone, and surely neither do you!

And finally: It really makes it hard to do caching inside the reporting engine. For the reporting engine, these replaced strings look exactly like static strings and all of the sudden the cache is poisoned with a invalid copy.

Moving that information from parse-time to runtime solves that problem for me.

The report-environment is a special interface that allows third party systems like the Pentaho BI-Server to inject configuration settings in a safe and pre-defined way. That configuration information is then available to all expressions and functions. It completely eliminates all magic from the report processing.

A report-environment entry is not a parameter. A parameter is a external value that has been provided by the user or that has been calculated from end-user or report-designer input. A
report-environment property comes solely from the local runtime environment.

Inside the reporting engine, a report-environment setting is a single string. The setting is either one of the well-known environment-keys provided by the runtime implementation, a session-entry or a user-defined environment setting.

Well-Known Keys

  • serverBaseURL:
    The server’s base URL without any servlet context. Use this one to link to other resources on the same server.

    http://127.0.0.1:8080/
    
  • pentahoBaseURL:
    The server’s base url including the local web-application context. Use this one to link to other Pentaho Services.

    http://127.0.0.1:8080/pentaho
    
  • solutionRoot:
    The solution-repository root directory.
  • hostColorPort:
    The host name and port, without the protocol prefix or any path information.

    localhost:8080
    
  • requestContextPath:
    The context path of the web-application

    /pentaho
    
  • username:
    The username of the currently logged in user.
  • roles:
    The set of roles the user is assigned to.

Session keys

A environment setting starting with the prefix “session:” will be treated as a local session lookup. A session lookup will only work inside the Pentaho BI-Server. The BI-Server will compute the session attribute name by removing the “session:” prefix and will lookup the value in the users HTTP-Session instance.

If the value found there is a string it will be returned as valid entry. If the value is a string array, it will be converted into a CSV text using the excel quoting syntax (double-quote-character
are quote characters, and text containing either the comma, a quote character or a line-break will be quoted.

The value Joe, the horse, says "Ho" will become "Joe, the horse, says ""Ho""".

You can use the CSVARRAY formula function to parse that text back into a String-array inside the reporting engine.

User-Defined Environment setting

If a key does not match either one of the well-known keys or if the engine cannot identify it as a session-entry, it will be treated as user-defined setting. The reporting engine will consult the global configuration and will try to find the value by appending the environment key to the configuration prefix “org.pentaho.reporting.engine.classic.core.environment.“.

Therefore a environment lookup for the key “Foo” will query the global configuration for a configuration entry called “org.pentaho.reporting.engine.classic.core.environment.Foo“.

How to use them

The report-environment can be accessed via the OpenFormula function
ENV“. Example:

=ENV("pentahoBaseURL")

The result of the lookup can be used in the same formula:

=ENV("pentahoBaseURL") & "/content/reporting/report.html?solution=steel-wheels&path=reports&name=BuyerReport.prpt"

You can also make the value accessible as a separate field. Place the ENV lookup into the post-processing formula of a hidden parameter or into the “formula” property of a OpenFormula-Expression and the value appears as field.

But there is a easier way to convert your environment lookups into fields!

Auto-mapping

The auto-mapping definitions allow the Pentaho Reporting Engine to automatically lookup a set of report-environment keys and to publish those values as “env::” fields. Which fields are auto-mapped is controlled by a set of configuration entries in the reporting engine’s global configuration.

A mapping rule follows the following syntax:

org.pentaho.reporting.engine.classic.core.env-mapping.=

If the environment string is a CSV string, you can tell the reporting engine about it and it will produce a field that contains a String-array.

org.pentaho.reporting.engine.classic.core.env-mapping.-array=

It is generally a good idea to keep your environment fields in a separate namespace by using the “env::” prefix for your fields. This way you will not run into conflicts with fields from your database or expressions later on. I also heavily recommend to try to keep the reference to the environment key in the name, this provides a sensible self-documenting entry.

Where to use Report-Environment fields

The predefined keys can be split into two groups. The first group, all path and URL related entries are used to calculate relative resource URLs, like logo images or HTML and JavaScript includes.

The second group ,”username” and “roles” is used to enforce security rules in datasources or on calculations.

The third use-case would be to provide system dependent parametrization, like printing static text. It is a easy way to include the department name or the name of the staging system (test, dev, production) in every report.

And last but not least: The environment is accessible to scripts and formulas, so it can be used as a system wide parametrization avenue.

Setting up session values

You can define the values for the session-keys of the report-environment inside the BI-Server with the help of a on-logon-XAction. Within that XAction, use a JavaScript-Rule to access the HttpSession object and to store the string-value on it.

Setting up global user-defined report-environment values

The global user-defined report-environment values are defined in the reporting engine’s global “classic-engine.properties” file. You will find that file in the Pentaho-Report-Designer’s “resources” directory. The BI-Server’s global reporting configuration file can be found in the WEB-INF/classses directory.

The report-environment is a simple and easy to setup way to make your reports more reusable. Whenever you find yourself changing reports manually  while transitioning them from development to testing or production, you now have the tools at hand to elimenate these manual steps.

Demos and other Material on Pentaho Reporting

There are a couple of recording and documents out there to give you a good start in using our reporting tools. The only drawback about them is: You have to know that they exist. So let me list a few of these resources here:

What is that ‘drill-linking’?

On Pentaho’s roadmap shown earlier this year, the first item for Pentaho Reporting was a mysterious bullet point called “Drill-Linking” with no further explanation on the slides on what this actually means.

Drill-linking is, very generally speaking, the ability to connect reports with other reports or more general any web-based systems via hyperlinks.

At the moment, drill-linking can be done by adding a style-expression on the href-style-key on any element. There is no big magic to it and it is (in most cases) reasonably simple as long as you know where you link to.

As this is the only roadmap feature for the upcoming 3.7 release, there must be something more to it. (After all, you dont make a release out of nothing, don’t you? And it would be Dilbert-style weasel work if we do. But then again, its product management and marketing – and thus
everything that is not outright illegal is probably allowed (and has been done somewhere in the past).)

So what is our idea of drill-linking?

Foremost it is a new method of computing URLs for using it in the href style-key. For that I created a new formula function called “DRILLDOWN”, but more on that later.

It also includes work on the platform components, so that we can ask published reports, XActions, Dashboards and so on for their expected parameter set, so that we can provide a sensible UI for the editor.

And last but not least, I’m keeping an age-old promise and extend the JFreeChart-expressions to generate image-maps (and thus allow to put link on elements in the chart-plot, so that you can click on the pie slice to jump to a more detailed report that tells you all about that specific country/product/whatever.

DRILLDOWN function? What’s that?

The DRILLDOWN function is a declarative way to define URLs. Whenever we want to build an user-interface for a feature, it is most helpful if the values we want to edit exist in a well-defined data structure instead of being hidden like Easter-eggs in a programming language.

(This is the same argumentation we had about XActions vs. PRPT, so if you know the reasons for having a cleanly editable model, you can skip the next parts. But as I still – from time to time – get those questions on why we had to use our own datasources when the XAction system already
worked in the platform, I will go through the same arguments for the DRILLDOWN as well. At least I can point everyone to this post and wont have to waste my breath in repeating myself again and again.)

Up until now, URLs are computed via a formula function that most of the time looks something like this:

="http://localhost/pentaho/content/reporting?solution=steel-wheels&path=&name=report.prpt¶meter="  [value]

or it looks like one of these

= [::env::baseURL] &  "/content/reporting?solution=steel-wheels&path=&name=report.prpt¶meter="  [value]

= [::env::baseURL] & "/content/reporting?" &  MESSAGE("solution={0}&path={1}&name={2}¶meter="{3}";  "steel-wheels"; ""; "report.prpt"; [value])

This stuff is easy and obviously works. But the problem is – as so often – in the details.

I can some up with a million ways of generating the same URL with different formulas. These formulas can be simple, can be complex, but they wont be easily parseable so that I can extract the values for the UI in a automated way. And whenever we fail to extract the values correctly, we will be beaten up by the users for providing a buggy system, or a bad user experience.

We _could_ workaround that by saying: Dear user, we only parse a very specific format. If you are not following that rule, you are on your own. (In the XAction world, this “workaround” was suggested for our PRD-3.5 as well – only accept XActions that look like the ones generated by PRD. And anyone with non-standard problems would be left in the rain.)

Nah, such a solution is not a solution at all. First, no one reads manuals, where that limitation and the valid format would be documented. I don’t, and I guess you do neither. So the first thing I would receive is bug reports on why this formula is not working in the Drill-Down UI. Bah! I want to spend the rest of my life lazily sitting at the beach, and not answering these reports with “RTFM”.

Parsing a program, and a formula or a XAction is a programm, and predicting what it does and how to change it to do something else is a very interesting research area – but the only research I want to do has to do with beaches at the Irish sea, not with geeky problems.

(Welcome back to everyone who skipped the previous paragraphs.)

So we need something that splits the various parts of a URL into separate fields. So lets look at what fields we have:

  1. Protocol, Host, port, web-application-base-url:

    This tells us where the web-application sits.

  2. Content-path:

    This points to the reporting plugin, the XAction-servlet and so on. It can also point to a PHP page, if we want to link there.

  3. Parameter:

    In the pentaho-world, we have three standard parameter, that are always there: solution, path and name, which (for historical reasons) contain parts of the solution path to which we link to. When linking to a non-Pentaho service, that could be omitted.

    And then of course, we have loads of user-defined parameter, that the report, XAction or PHP file could accept.

    All parameter are name-value pairs, so a 2D-array seems a natural solution.

So our DRILLDOWN function would look somewhat like this:

DRILLDOWN("http://localhost/pentaho"; "/content/reporting";
{{"solution" | "steel-wheels"} |
{"path" | ""} |
{"name" | "report.prpt" } |
{"parameter" | [value] }})

Not bad for a first throw. Now even a brain-dead monkey could write a UI without having to worry about how to parse complex formulas into a usable editor model.

But there is more to it. First, we know that for reports linking to the same server, we can replace the hardcoded server address with the “env::baseURL” variable to make that report more maintainable.

Next, the “/content/reporting” is only valid for links to PRPT files. For XActions that part looks different (heck, I never can remember the real value!) as does it for dashboard-links.

Boring stuff like that screams to be abstracted away from the unexpecting user.

Every link can be classified into one or more groups. We can ask questions like:

  • Do you want to link to a local report or a report on an other server?
  • Do you want to link to a PRPT file, a XAction or something else?

Let me call the combined set of answers to these questions “profiles”. And hey, lets give our profiles some speaking names and lets hide the technical gibberish that so confuses the poor user. Then our drill-linking function can look somewhat like that now:

DRILLDOWN ("remote-prpt"; "http://localhost/pentaho"; {...  parameter-array here ..})
DRILLDOWN ("local-prpt"; " used"; {...  parameter-array here ..})

The various profiles along with the ugly technical stuff are then safely tucked away into a configuration file. Now if we make these profiles (admin-)user configurable, we actually opened up a totally new play ground. Now we can do stuff like

DRILLDOWN ("accounting-prpt"; " used"; {...  parameter-array here ..})

and link to the almost forgotten server in the accounting department and best of all, we dont have to remember the IP or DNS-name. And if they ever move the server, one change in the profile-configuration fixes all the reports.

The profiles would be defined in a configuration file in the reporting engine and (as usual) can be overridden by a local configuration. This way, “local” can take up different meanings depending on where the report runs. Now, if the profile configuration also defines how to produce the URL, it is easy to let knowledgeable admins create their own link-profiles to 3rd party systems.

DRILLDOWN ("legacy-erp"; " used"; {...  parameter-array here ..})

could easily produce a link to your ERP system with the customer number or product id passed down. And again your users would not have to worry about the ten-foot-long URL that is needed to make that happen.

How would the UI know what parameters and fields to offer?

Well, this is a problem I prefer to solve by not solving it. OK, that was to much Zen. So lets talk about the UI for the Drill-Down system.

In its most simple form, we can present it as a set of input fields, each one corresponding to one slot in the DRILLDOWN function, and a plain table representing the parameter name/value array.

Its generic, its simple, but its just not sufficient.

For Pentaho-links, we have to maintain the three special parameter (solution, path and name) internally. I prefer to give users a file-system like view with a single path to enter. After all, we are long past since the DOS-driveletter system (ok, most of us are). So let’s make that thing one input and let the UI code figure out how to split that into the three parameters.

For third party links, we have to keep this a single path.

For remote links, we need to show the host-URL inputs, while for local ones this would be dead-confusing.

For PRPTs we probably want to show the system level properties as well. Not everyone want to paginate, and not everyone wants HTML as output.

So in short – thinking of a single unified UI makes my brain hurt. And whenever that happens, I delegate (in the worst sense of the word) my problems to someone else. The someone else happens to be you.

In my book, a pluggable profile system needs a pluggable UI as well. We provide sensible UIs for the profiles we define, and a rather generic one for everything else (some sort of fall-back, in case you or your admin is as lazy as I am).

If you have a complex URL building requirement, then at least you can provide your own UI. If you’re happy with the ones we provide – fine, then we all can spend our time drinking beer instead of coding. And if the generic solution is good enough to cover common linking problems well, then only system integrators will have to worry about actually writing UI-plugins. But if your users start to nag, and after playing World of Warcraft on the servers got boring, then you have an easy way to provide them the slick made-to-measure UI they always begged for.

Converting Paintings into Tables

In Pentaho Reporting Classic, all Report-Elements are positioned somewhere on a canvas. Whenever an band is being printed, the layouting system jumps in an computes the final layout for the band and all child elements. After the layouting is complete, each element has some valid ‘bounds’, which describe where the painter would have to place the element on the canvas.

The table-generator’s work starts after all elements have a valid layout. For each visible element in the band, the layouter takes the bounds and generates a column or row break for each edge position. All bands of the report are added to a single table. Therefore the table’s column will be based on all generated column breaks of all bands.

Pentaho Reporting Classic has two table-export modes. In the ‘simple’ export mode, the table-generator ignores the right and bottom edge of the child elements (but not for the root-level band). If a ‘complex’ layout is requested, all boundary informations are taken into account.

Theory is dry, lets take a look at some numbers:

Lets assume we have a root-level band with a width of 500 point and a height of 200 points. The band has two childs, a label and a text-field. I’ll specify the bounds as absolute positions, (X1,Y1) denotes the upper-left corner, and (X2,Y2) denotes the lower right corner.

The bounds of all elements involved are:

  • Band: X1=0, Y1=0, X2=500, Y2=200
  • Label: X1=100, Y1=0, X2=300, Y2=100
  • Textfield: X1=100, Y1=100, X2=250, Y2=200

Let’s follow the table-generator’s steps. We assume that the complex layouting is used.

  1. The Band gets added: As there are no column breaks yet, A column break will be added for X1=0 and X2=500. A rowbreak is added at Y1=0 and Y2=200.The first break always marks the start of the table, and the last break marks the end (and total-width) of the table. The table now consists of a single cell, that has a width of 500 points and a height of 200 points.
     
  2. The Label gets added: As there is no column break for X1=100, a new column break is inserted. The table’s only cell splits into two columns.
      Label

    A column break for X2 gets inserted at position 300. The table now contains 3 columns.

      Label  

    The Label’s Y1 does not cause a row-break, as the band already caused one at this position. A row break for Y2 gets inserted at position 100. The table now consists of two rows.

      Label  
     
  3. The text field is added to the table. X1 does not cause a column break, as there is already one at this position. X2 causes a new column break at position 250. Note that the label already occupies the cell from X=100 to X=300. This cell will now span two columns. There is already a column break for the text-field’s Y1 position (at Y=100, caused by the labels bottom edge) and for the Y2 position (at Y=200, caused by the band’s bottom edge).
      Label  
      TextField  

If the table-generator uses the simple algorithm, the resulting table gets simplified in a second step. The column breaks at position 250 and 300 have been caused by a right edge of an report element. These breaks now get removed, so that the resulting table looks like this:

  Label
  TextField

Now it should be clear, that the table-generator works best, if all elements are properly aligned. All elements that should go into one row or column have to start at the same X and y positions. If the strict layouting mode is used, they also must end at the same position. Elements that should go into neighbouring cells must share a common edge. And finally: Elements that do not start at position Zero will cause an empty column or row.

In the next post, I’ll cover how Pentaho Reporting Classic computes cell backgrounds and borders.