Category Archives: Development

Printing Watermarks with Pentaho Report Designer

A watermark is a image or other content that is printed in the background of your report. You can use watermarks to indicate a status of your report (Confidential, Draft or For Internal Use Only). Or you can use watermarks to print a form template into the background of your report.

A well defined watermark can transform a report from a uninspired bunch of numbers into a fully branded document that reinforces your companies image in all receivers of the report.

So how do you define a Watermark in Pentaho Reporting?

Activating the watermark

When you create a new report, the watermark section is hidden. Before you can create new elements on this section, you will have to show it in the Pentaho Report Designer.

  1. Switch to the Structure Tree
  2. Locate the Watermark node under the master report node.
  3. Select the Watermark node and switch the properties table to the “Attribute” view
  4. Set the “hide-on-canvas” attribute to “false”

At the bottom of your design area in the report designer window, you will now see the watermark section below the page-footer section.

Properties of a watermark

The watermark section is printed behind all other content. Watermarks are special content that is only useful for printing and the PDF export. Plain-Text exports and all table-exports do not allow overlapping content. A watermark would prevent the printing of other content on that page. Therefore watermarks are disabled for these export types. Even if your watermark section contains content, the Pentaho Reporting Engine would not process any of it.

A watermark always spans the complete width and height of a page and content printed on the watermark does not affect any other content on the page.

Defining content for watermark sections

Watermark section is a ordinary band. You can drag and drop fields, labels or images into the watermark just as you would do for any other of the bands. The watermark is printed when a new page is started and fields print the same values as fields defined in the page header. Like all page-header or -footer bands, watermarks cannot contain any subreports.

The Pentaho Report Designer ships with the “Production Reports/Invoice” sample report that makes use of a watermark to show a background image to reinforce the branding of the “Steelwheels” toy company and prints a stamp text on the report.

Integrating PHP/ASP/Ruby-Web-Applications with Pentaho Reporting

From time to time we get the same question on the Pentaho Reporting Forum:
“How do I integrate the Pentaho Bi-Server and reporting capabilities with my own (PHP/ASP/JSP/Ruby/whatever-) web-application?” Thrilled by the BI-Server’s blazing features, many users want to enrich their applications with rich reporting and analysis capabilities, and unlike the dinosaurs of the BI-Market, our Open-Source solution makes this easy and cost effective.

The techniques I am going to describe in this blog-post are no magic and can be applied to other web-application (like our Data-Integration server) as well.

First, you need to get a Pentaho system up and running. For now, and for the sake of easiness, lets stick with the standard Pre-Configured Installation (PCI). The Pentaho PCI is a Apache Tomcat server that runs the Pentaho BI-Server as its only web-application.

Installing the Pentaho BI-Server

  1. Download Pentaho
  2. Unpack the ZIP or tar.gz archive into a directory. On Linux or Unix systems, I would recommend something like “/opt” or “/usr/local/” as target. On Windows .. well, in that case you are just experimenting and not planing for production, I assume. So any folder will do.
  3. Setup a publish password

    You need that later for publishing reports or other files to the server. This is the major obstacle for most new users when trying to get their own content up and running.

  4. Edit the file “bi-server-ce/tomcat/webapps/pentaho/WEB-INF/web.xml”

    Change the parameter “fully-qualified-server-url” to the public URL of your server.

    
    
      fully-qualified-server-url
      http://localhost:8080/pentaho/
    
    
  5. Optional: If some other tomcat installation runs on your system, your port 8080 may be in use already.

    You can change the port in the file “bi-server-ce/tomcat/conf/server.xml”:

  6. Start the Hypersonic database
  7. Start the Pentaho BI-server
  8. Optional: Install the BI-Server as a system service.

You should now be able to direct your web-browser to the BI-Server and should see the login screen when you type http://localhost:8080/ into your browser’s address bar.

Integration by Direct Linking

Integrating a BI-Server into a existing PHP application is easy now. All you need is a running BI-Server instance that is accessible by your users. From now on, I will simply use the Pentaho Demo Server for our little integration project.

http://demo.pentaho.com

The Pentaho Demo Server is a slightly tuned down installation of the Pentaho BI-Server. Anything I am going to show will work in the same way with the standard community edition.

When refering to the two applications, I will use “application” for the PHP, ASP or Ruby application that wants to utilize the Pentaho Platform. I will use the term “Pentaho” for the Pentaho BI-Server.

The most primitive way of connection two web applications together is to make the application include Pentaho in an frame or by opening up the specific report or analysis view in a new window.


  
    My Web Application
  
  
    

    This file could have been generated by any server side program. Use your phantasy here.    

   

    Reports can be parametrized by adding the parameters for the report to the URL. The parameters must be properly URL encoded. Use UTF-8 as character encoding.    

   

    This shows how to open up a report in a new Window:          A report    

     

Inside the Pentaho Platform, each report has its own page that can be accessed independently of the Pentaho User Console. The sample above links to a report on the system, using the predefined username and password. Note that the username and password is given in cleartext. To make this more secure you will need to set up a Single-Sign-On solution.

Pentaho Enterprise Edition comes with support for setting up Single-Sign-On for CAS rather easily via some setup scripts that take care of the ugly work. If you are using community edition and are willing to spend the time, you can always set it up yourself.

Alternatively you can set up Pentaho to work with Active Directory or LDAP instead.

Here are some sites with documentation to get you started.

And don’t forget: The Pentaho Wiki also contains a load of information.

Internally, Pentaho uses Spring Security, so with enough time and patience you should be able to connect Pentaho to anything on this planet for authentication purposes.

Align Text in Number Fields and other Formatting Basics

Today lets talk about you can format report elements and what options our Pentaho Report Designer offers you.

Anatomy of Elements

Elements can be grouped into two classes. Text elements transform data into a textual representation. Labels, String-Fields, Message-Fields and Number- and Date-Fields are examples of these fields. Graphical elements produce a image to display the data. Charts, Sparklines or Image-Fields are examples for this class of elements.

A report element in Pentaho Reporting consists of style information and attributes. Most styles and attributes can be provided either statically or can be calculated by a function. The static properties are called “design time properties”. The calculated properties are called “runtime properties”.

Attributes control the element’s behaviour and how data is processed for displaying in the element. Examples for this are format-strings, rich-text processing and so on. Style properties control the visual appearance of the data printed.

Basic Formatting

With the Pentaho Report Designer you will find many formatting options that you can find in other text processor or graphical programs.

All elements share some common properties:

  • visible: Defines whether a element is shown on the final print out. Visibility is mostly used as runtime property instead of being specified at design time.
  • min-width, min-height: The width and height of an element define how much space a element takes on the paper. It also controls how many lines of text can be printed and how many characters fit onto each line. The size is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • x, y: The position of the element if placed into a canvas element. The position is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • dynamic-height: Dynamic height is a flag you can set to let the element expand its height to match the content you attempt to print.
  • various border properties: Allows you to draw a border around the element. You can define the border of the currently selected element very easily by using the menu option “Format->Border..”
  • padding: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element. The padding size is given in points (1/72th of inches).
  • text-color, background-color: Defines the element’s foreground and background colours.

Text elements

For text elements, the most common style you are going to change is probably the font properties.

  • font-name: Defines the font name. The font must be available, or a default font is used instead. This is especially important if you intend to publish the report to a server, which may not have the same fonts as you installed.
  • font-size: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element.
  • bold, italics, underline, strike-through: Various flags controlling the appearance of the text
  • embedded: This flag is only useful for PDF exports. It controls whether the font is embedded into the PDF document. Only TTF-fonts can be embedded and not all fonts allow you to embed them.
  • h-align, v-align: Positions the text within the element.

Graphical Elements

Graphical elements have different properties based on what sort of element you are dealing with.

  • scale, aspect-ratio: Defines whether the shape will scale up or down to fit the element’s bounding box. If “keep-aspect-ratio”
  • fill-element: Defines whether the shape will be filled. This has no effect on lines. Elements will be filled with the colour defined in fill-color
  • draw-outline: Defines whether the shape outline will be drawn. The outline will be drawn with the colour defined in text-color
  • stroke
  • : Defines the stroke that is used to draw the outline of the shape.

Attributes

Most of the attributes for elements are advanced properties and not required to actually get a basic report up and running. So I will concentrate on the ones that really matter:

  • value: The static value that should be printed. If this is defined,
  • field: The name of the field from the reporting engine reads the value. If a value is given in the “value” attribute, that static value will be used and the field will be ignored.
  • if-null: A static value that is printed if the field would evaluate to otherwise.
  • data-format: A flag that defines whether the element will update its format string from the meta-data given by the data source.
  • style-format: A flag that defines whether the element will update its style properties from the meta-data given by the data source.
  • show-changes: If this flag is set to true, the element will only print if the value has changed from the previous value. The element will always print if it is the first element in the group or on the page.

Message-, Date- and Number-Fields

  • format: Defines the format string. To make number fields align correctly, ensure that your format specifies a fixed set of decimals in the format string and that you right-align the field in the style options.

Armed with this knowledge, formatting elements in a report should no longer be a daunting task. “Be fruitful and multiply your reports, and fill the sheets and subdue it; and have dominion over your data and over the numbers of the accounting and over every piece of information that moves in your company.”

Pentaho Report Designer – Essential Readings to get started

When you download the Pentaho software for the very first time, the bunch of cryptic applications can be daunting. Reading about all the new and complex concepts of OLAP-cubes, data-warehouses, ETL processes to fill the data-warehouses or metadata layers is daunting at best.

If all you want to do is start with a report to see your own data printed, then at first, you can ignore the cryptic stuff and stick to the bread-and-butter parts: Using the Pentaho Report Designer to create your first report. Lets find out how.

First, you need to make one critical decision: Is what you want a classic report that is primarily meant to be printed (or used as if being printed)? Or do you want to explore your data to discover new knowledge.

Reading: Learn about the two kinds of reports.

If you are seeking an analytic tool for exploring your data, then you have no other option than to setup a OLAP-Cube to use the Pentaho Analyzer. These days, for simple cases you can get quick results with Pentaho’s Agile BI Initiative, so check them out.

From now on, I assume you are more interested in conventional reports with the sturdiness and durability of real paper. For over 500 years, since the invention of the double-entry bookkeeping, business reporting played a crucial rule in the business world. Reports are made to last.

To get started with the Pentaho Report Designer, you do not need any special setup or a data-warehouse as Pentaho Reporting can work with most relational databases after installing the right JDBC driver. You should get these drivers from your database vendor.

The Pentaho Report Designer already comes with drivers for MySQL, Hypersonic and H2 database systems. You can install your own drivers by copying the driver jar into the “lib/jdbc” directory inside the report designer.

Reading: Follow the walk-through to learn how to create your first report

This tutorial uses the sample database that we provide inside the report designer. This step-by-step guide will show you how to setup a database connection, how to get data into your report and how to layout elements to make a visually appealing result. When you use the SQL datasources you need a basic understanding of SQL. If you are not familiar with SQL, you can learn the basics of SQL in this rather good tutorial. Your database manual should also contain a introduction to SQL.

Work through the tutorial and you are ready to make your first steps in the world of Pentaho’s Business Intelligence tools.

There are some articles in this blog that help you further:

Access session and environment variables in a PRPT

Reports need to interact with the outside world in many ways. To formalize this, we allow to pass values from the outside to the inside. Pentaho Reporting knows several classes of inputs:

  • Environment variables
  • Parameter: User supplied values usually coming from an interaction of the user with a user interface
  • DataRow-Fields: part of the result-sets from the data-sources
  • Expressions and Functions: Calculated values

Today, we will talk about environment variables. Environment variables are system level parameter. You can use them to configure reports based on where you deploy the report.

We provide several properties for use within the Pentaho BI-Server.

  • serverBaseURL: the server URL, for example: “http://localhost:8080”
  • pentahoBaseURL: the location of the Pentaho Web-application: “http://localhost:8080/pentaho”
  • solutionRoot: The file-system location where the local copy of the solution repository can be found.
  • requestContextPath: the servlet’s request context path pointing to the local web-application: “/pentaho/”
  • username: The current user’s name
  • roles: The roles as CSV-encoded list. The CSV string is quoted according to the Excel CSV quoting rules.

Inside the BI-Server, environment variables can also be used to interact with the server’s environment to query session and global parameter information. The report environment allows you to access values stored on the current user’s session or as global parameter by using a special prefix.

Requesting a environment property “session:name” will query the session context for an entry called “name” and returns the string representation of this object to the report.

Likewise, prefixing a environment property with “global:” will query the global parameter collection of the BI-Server. Therefore querying a environment variable “global:name” will look for a entry “name” in the Bi-Server’s global parameters.

Accessing Environment variables with Formulas

The Pentaho Reporting Engine offers the formula function “ENV” to access the report environment properties.

Definition:

=ENV(String)

To access the variable called “example” as plain text, use

=ENV("example")

To access a value bound to the server-side session variable “example” use

=ENV("session:example")

If the variable is a CSV encoded string, you can use the CSVARRAY function to parse the string into a array of strings. This array is then suitable to use for multi-selection parameters or SQL-IN clauses.

=CSVARRAY(ENV("roles"))

You can now use these variables as part of other formulas, for instance to compute URLs pointing to documents on the server server.

=ENV("serverBaseURL") & "/myotherwebapp/downloads/pentaho-report-designer.zip"

Accessing Environment variables as predefined fields

When you open up the Pentaho Report Designer, you will see several of the well-known environment variables listed as fields on the report itself. Using the ENV function is flexible, but not very convenient. So we added a automatic mapping to the reporting engine.

The various env:* fields you see in a report are what I call “well-known” fields. As Pentaho Engineers we know knows that they exist on the server and are heavily used by our users. So we added a mapping to the system so that they automatically become fields without having to use a formula function to read their values.

The list of “well-known” fields is defined in the global report configuration. The global configuration can be edited by creating or editing a file called “classic-engine.properties” inside your Pentaho Report Designer and Pentaho BI-Server directories.

Inside the Pentaho Report Designer, this file resides in the “resources” directory in your Pentaho Report Designer installation directory. For the BI-Server you can find this file in the “pentaho/WEB-INF/classes” directory. To make your reports work consistently inside the Report Designer and the Server, make sure that both files have matching definitions.

For example: If you want to map your environment-property “value” into the field “env::value” then you would add the following line to your “classic-engine.properties” file:

org.pentaho.reporting.engine.classic.core.env-mapping.value=env\:\:value

In case value contains a comma separated list (using the Excel-CSV quoting rules for strings that need to be quoted), then you can make that list available as array by appending -array to its name.

org.pentaho.reporting.engine.classic.core.env-mapping.value-array=env\:\:value-array

The engine will recognize the “-array” suffix and will look for a “value” property instead, parses that into a string-array and returns you the array. The array then can be used in a SQL-IN clause.

The prefix “env::” is a convention inside the reporting engine to separate environment fields from other fields. If you omit this, you have to make sure that none of the fields returned by your data sources and none of your expressions or functions provide a value under the same name. This would overshadow your environment property and renders it inaccessible.

Table-layout: An easier way to build tables

At the moment I am working on the cross-tabbing code inside the Pentaho Reporting Engine. Ever since our old JFreeReport days this feature was on the list of things to do, but never quite made it up to the top of the heap.

While roaming through the existing and experimental cross-tab code, I found several easy ways to break the layout. Set a cell’s width or height wrong, and you are hosed. Dynamic height? Layout kaput! Null-values? Table goes down the drain.

We need something better!

Right now, we are using unrelated bands stacked together to get some appearance of a tabular structure. This works for the simple case, but anything complex causes trouble, similar to using CSS-styled DIVs in HTML to emulate tables. So we really need true tables to make it work.

Now, instead of building a one-off layout strategy, I am going to write this table-layout in a more general way. This way, you can apply it to groups or even ordinary sub-bands.

For groups, you will be able to set the group’s layout property to “table”. The group-body, the group-header, footers and other bands will automatically form a single cell in the table by default. So nothing breaks, nothing changes.

The magic starts when you apply the layouts “table-body”, “table-row” or “table-cell” to the various sub-bands. The bands then will be merged into a single flat table structure which shares borders and which synchronizes the layout.

There are several immediate advantages from the “true table layout”(tm):

  • Your header now controls the width of all cells. Increase the width of the header, and the details band follows. Even in design mode.
  • Borders merge properly, so that you get a proper grid rendered. No more double width borders then.
  • absolutely no overlapping cells
  • Table-exports can translate this structure directly into their documents. No more guessing why your cells have a empty cell here.
  • Technically, if you are willing to pay the price, we can have a complete auto-layout where the width of cells is based on the content printed in them. As in web-browsers, this is slow and memory consuming. Might be just a nice to have for slick demos. You may not want to use that in real world reports.

The table-layout will come within the next few weeks, along with the missing pieces of the cross-tabbing. Stay tuned ..

MySQL with Pentaho Report Designer for those who dont read manuals

Several people in the forums seem to have severe problems setting up MySQL or understanding how MySQL works. They also seem to have severe problems reading the database manuals pointed out in the forum, insisting the problem lays within the Report Designer. So I am going to write up some simple explanations on how to set up MySQL, how to configure the database users for access over the network and how to work around all the little and large weird quirks of that database. And then hopefully I just can point these users to here and never have to deal with it again.*


I do assume that you have MySQL installed and up and running. If not, then go and get PostgreSQL, a database I personally trust.

Connecting to a MySQL database – the basics

All the Pentaho tools and thus the Pentaho Report Designer are Java Applications and therefore use JDBC to access the databases. For other databases, the general rule is: Use the driver that ships with your database and keep the version numbers in sync.

To work with a MySQL database, you will need a suitable MySQL driver. In my experience MySQL drivers have not been the most stable nor most reliable drivers around. At Pentaho, we recommend to use the MySQL JDBC Driver version 5.1.10 as default driver, as this seems to be the most stable driver. Other versions may produce invalid results, and if you want to use any other version, make sure you test it carefully.

If you use the Pentaho Report Designer to create a connection, you will be asked for several properties. To connect to the database directly, select the connection mode “native”. Then enter your hostname or IP-address, port (default port number for MySQL is 3306) and username and password.

For generic JDBC database connections, you need the JDBC-URL, your username and password and optionally the connection properties:

Setting Value

Driver

The driver class to load. The name of the class that implements java.sql.Driver in MySQL Connector/J is com.mysql.jdbc.Driver.
JDBC-URL

The JDBC URL for the MySQL JDBC Driver uses the following format. Items in square brackets ([, ]) are optional.

jdbc:mysql://[host:port]/[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue2]...

If the host name is not specified, it defaults to 127.0.0.1. If the port is not specified, it defaults to 3306, the default port number for MySQL servers.


Username

Your user name as given to you by your database administrator. See below how MySQL translates that information and matches this against the internal user database.

Password

Your password for the connection.

To get the same connection as shown in the dialog screenshot above, you would have to specify the JDBC-URL:

jdbc:mysql://localhost:3306/sampledata

with username “pentaho_user” and “password” as password.

In most cases your database will be running on a different host than the computer you are editing reports or where the BI-Server is installed. When the database is running on a different computer, you will be able to distribute the work-load of answering requests and generating reports a lot better. If both the reporting server and the database share resources like the CPU, memory or disks, there is always the danger that both systems compete for resources and slow each other down unnecessarily.

User management and access control in MySQL

When you connect to a MySQL server, your identity is determined by the host from which you connect and the user name you specify. A MySQL server side user name is the combination of the user name you specified on the connection (“pentaho_user” in the example above) and the host from where you try to connect to the database server.

MySQL users are defined by their username and hostname and generally written in the format ‘username’@’hostname’.

In the example above we connect to the special host address ‘localhost’ (or by its IP address: 127.0.0.1). Only applications running on the same computer can make connections to this interface. By default a server will listening to these connections will see them as originating from “localhost” itself.

So in the above example, the full identity would be ‘pentaho_user’@’localhost’.

MySQL uses a two-stage authentication system. In the first stage it checks, whether you are allowed to connect to the database and validates your username and password. In the second stage MySQL checks whether you have the access rights to see the contents of the tables stored in the database or whether you are allowed to update or delete data.

When we try to establish a connection to the MySQL server, the server will check its local user database for an entry matching the username and password. If there is no user defined for the given hostname, you will see the error message

Host [...] is not allowed to connect to this MySQL server

The error message

Access denied for user 'pentaho_user'@'localhost' (using password: YES)

indicates that either the password is invalid or that this user has not been defined for connections from this host.

If you see the error message

ERROR 1044 (42000): Access denied for user 'pentaho_user'@'localhost' to database 'sampledata'

then you need to grant additional permissions to the user to allow access to the database.

Listing the defined users

MySQL stores all administrative information in a special system database called ‘mysql’. You can connect to this database like any other defined database. To see which users are defined, invoke the following SQL statement via the MySQL console:

SELECT User, Host FROM mysql.user;

MySQL will check your identity (username and host) against the user table. Both the user and the host must exist. If the table contains a host name called “%” this particular entry matches all hosts. If the table contains a empty user name, it matches any user name.

To allow the user ‘pentaho_user’ to connect from any localhost, you would need to have a matching entry for it in the table.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

To allow connections for ‘pentaho_user’ from any host, a entry with ‘%’ as hostname must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

Big fat warning: Even though both entries have the same user name (pentaho_user), they define two entirely different and unrelated identities for the database. It is totally possible and legal for ‘pentaho_user’ from localhost to have access to different databases than the same user for the other hosts. For MySQL these are two different entities.

Connections from Pentaho Report Designer via Connections from PhpMyAdmin or other Server Side Software

The MySQL identity system of treating users from different hosts as completely different entities can cause a lot of headaches, especially if you are not familiar with the underlying workings of the internet.

If your MySQL server is set up to only allow connections from specific hosts, you can run into problems where you can access the database from your web-server but not from within the report designer or vice versa.

Lets look at the most common usage scenarios:

Pentaho Reporting, the Web-Server and MySQL run on the same machine

This is a typical development scenario, where the developer installed all software on his own box for convenience.

+---------------+
| PRD-Host      |
+---------------+
| Web-Server    |
+---------------+
| My-SQL Server |
+---------------+

MySQL must be configured to allow access from ‘localhost’ and all connections in the report must use ‘localhost’ as the hostname for the MySQL connections.

A identity for ‘pentaho_user’@’localhost’ must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

Pentaho Reporting, the Web-Server and MySQL all use different machines

This is the typical production scenario. Both the database server and the web-server run on different machines for maximum performance. Both machines are usually installed in a server room or data centre.

+----------+          +------------+         +---------------+
| PRD-Host |  --+-->  | Web-Server |  ---->  | My-SQL Server |
+----------+    |     +------------+         +---------------+
                |                                 /|\ 
                +----------------------------------+

Both the developer sitting on the Pentaho Report Designer and the Web-Server need access to the database. While the developer creates the reports he needs access to the database (or a test database with a similar structure than the production database). When the report runs, the web-server must connect to the database as well to execute the reports.

Both the developer PC and the web-server must have entries for the user ‘pentaho_user’ in the user table.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of webserver | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

Alternatively, you could allow access from all hosts. Note that if your database server is not shielded by a firewall, then you may easily open your database to the world. Any self-respecting database administrator may try to strangulate you for suggesting such an configuration.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

The Web-Server and MySQL share the same machine

You find this set up occasionally in smaller hosting companies or from people who own just a single server. The database and web-server share the same machine, which is hosted in the data centre or server room. The developer runs the report designer on his own desktop machine.

+----------+          +---------------+ 
| PRD-Host |  --+-->  | Web-Server    |
+----------+    |     +---------------+
                +-->  | My-SQL server |
                      +---------------+

The web-server can access the MySQL server via ‘localhost’, while the developer needs to access the MySQL server from his own host outside of the server room.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | localhost                   | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

So if you ever wondered why you can access your MySQL server from PhpMyAdmin or similar tools, but not from the Pentaho Report Designer, you should now be able to see the answer and to fix the problem.

You can find more information on this topic in the chapter about MySQL user management in the MySQL manual.

Fine grained access permissions

Even if you are able to connect to the database, you may not be able to query the database. The second stage of MySQL’s security system checks your identity against a set of defined permissions. These permissions control whether you are able to query or modify the data in the database or whether you are allowed to change the table structures or perform other administrative tasks.

Pentaho Reporting only queries the database. The Pentaho reporting tool does not modify data and does not perform any administrative tasks. All MySQL users need to have the “SELECT” permission on all objects you want to access from the reports.

To grant a specific user the permission to select data, use the following statement from the MySQL console.

GRANT SELECT ON database.* TO 'pentaho_user'@'localhost';

MySQL connection properties you should know

To make the JDBC driver work properly in MySQL, you will have to specify some additional connection properties. The driver’s default settings can cause severe problems for the reporting system. These defaults have been defined for historic releases of MySQL and are harmful these days.

Property Definition Comment
useCompression Use zlib compression when communicating with the server (true/false)? Defaults to ‘false’. Useful if you access a remote server through a low bandwidth connection. Disable for production systems.
passwordCharacterEncoding What character encoding is used for passwords? Leaving this set to the default value (null), uses the platform character set, which works for ISO8859_1 (i.e. “latin1”) passwords. For passwords in other character encodings, the encoding will have to be specified with this property, as it’s not possible for the driver to auto-detect this. If you are not using an all ASCII password and work outside the area of Western European languages, knowing about this is a life-saver.
characterEncoding If ‘useUnicode’ is set to true (which is the default), what Java character encoding should the driver use when dealing with strings? (defaults is to ‘autodetect’). If the encoding cannot be determined, then an exception will be raised. Sometimes MySQL server do detect the character set correctly and therefore return garbage when querying tables with non-ascii data. In this case this property can be used to override the autodetection of character sets.
functionsNeverReturnBlobs Should the driver always treat data from functions returning BLOBs as Strings – specifically to work around dubious metadata returned by the server for GROUP BY clauses? The MySQL JDBC-Driver returns string results of functions in SQL statements as byte-arrays. If you use any functions in SELECT statements, set this property to “true” or you will see garbage in the results.
emptyStringsConvertToZero Should the driver allow conversions from empty string fields to numeric values of ‘0’? MySQL treats empty strings or values as zero for numeric columns by default. If you need to see the values as they are stored in the database without helpful corrections from the JDBC driver, then set this flag to false.
zeroDateTimeBehavior What should happen when the driver encounters DATETIME values that are composed entirely of zeros (used by MySQL to represent invalid dates)? Valid values are “exception”, “round” and “convertToNull”. Use this if your select statement fails with an exception and you have a zero-date in it. Set it to ‘convertToNull’ to come to a more reasonable result.
serverTimezone Override detection/mapping of timezone. Used when timezone from server doesn’t map to Java timezone Use this if your dates returned by the database seem to be offset by some hours. Also see ‘useGmtMillisForDatetimes’, ‘useJDBCCompliantTimezoneShift’, ‘useLegacyDatetimeCode’ and ‘useTimezone’ for more time zone related functions.
holdResultsOpenOverStatementClose Should the driver leave the result sets open on Statement.close() (enabling violates JDBC specification) This can be a very helpful property for reports with subreports. It allows the Pentaho Reporting engine to process several server-side resultsets in parallel. Without this setting, the reporting engine will fully buffer result-sets in memory instead, which increases the memory consumption of the reporting system.

Further readings

If you have made it this far, then let me tell you one big hint: MySQL comes with a nice administration tool called My SQL Workbench which allows you to administrate the server without having to go through the command line.

* Yeah. At least I can fake hope, even when I don’t feel it.

Pentaho Report Designer – Better default settings simplify your report creation

Will Gorman recently recreated a Google Analytics report in Pentaho Reporting. While the output looks reasonable good, the road to it was more painful than necessary.

Will’s direct feedback showed that he was not aware of a bunch of helping hands inside the Pentaho Report Designer which make it quite easy to create professional looking reports that just shine. Aside of that he also uncovered a bunch of usability issues. But luckily, many of these troubles are caused by just a few invalid settings. Lets have a look what he encountered and how we can make the pain go away.

  • The Swing Colour Picker really really sucks big time. But hey, that’s why the next big release will have a totally new colour picker on board.
  • Charting and the massive overload of chart properties caused him some headaches. At the moment, charting is really not fun – it does its job, but it severely needs some better documentation. Up until now, Will’s book on Pentaho Reporting is still the best documentation out there.
  • Aligning elements was Will might describe as “painful”.
  • Getting the elements inside the sub-reports align right with elements outside of the sub-report was … not fun.

When I opened the first edition of the report Will sent around, I instantly smelled trouble. This is what I have seen:

The red boxes mark overlapping elements. This will cause troubles when we export to any of the table-based layouts. The grey lines mark element boundaries. For a report that simple there are way to many of them here.

Now, why would Will create a report with random element positions in the first place? After all, Will is a full-blood engineer – so randomness for artistic bonus points is not an answer. The answer lies within the default settings of Pentaho Reporting 3.5 to 3.8. This is how Will has seen the same report:

The default settings of the Pentaho Report Designer hide all the helpful details from you.

With all helpers enabled, unaligned elements are outright visible, overlaps are clearly marked and the designer is actually encouraged to let elements snap into their best positions. A clean layout would have looked somewhat close to this.

The Pentaho Report Designer comes with four major tools to help you to create better reports:

  • View->Snap to Elements and View->Element Alignment Hints

    This powerful voodoo snaps element boundaries to each other. If your element comes close to a existing boundary, it will attach to it and will stay attached until it leaves the attraction zone. This makes positioning elements a easy task. The first element can be placed randomly, and all other elements will snap to this particular position.

    In Pentaho Report Designer 4.0, this feature will be enabled by default. It is a must have and I never design reports without it!

  • View->Show Overlapping Elements

    This makes overlapping elements visible in the editor. Overlapping elements are prohibited when you want to export reports to HTML, Excel, CSV or RTF. These report outputs use tables as their layouting and output mode and each table-cell can only be filled once.

    Overlapping elements would fight over the right to use the cell, and as a result the layout would be destroyed. Inside the reporting engine we avoid the troubles by using a “first-come-first-serve” strategy. That means, only the first element can occupy the cell, everyone else gets evicted.

    Just like the Snap-To-Elements, this is a must-have. You simply cannot edit a report and not have a clear warning about overlapping elements.

  • View->Grids->Snap

    This allows you to align your elements according to a predefined grid. The default grid-size is 5 point and moving elements lets them jump from grid-position to grid-position. This feature alone already reduces the amount of randomness in the element sizes.

    But personally, I don’t use this at all, as it limits my perceived ‘artistic freedom’. Your mileage may vary.

  • Guides and View->Guides->Snap

    Guides work similar to the grid and the Snap-To-Elements feature. You can position guides so that they reflect the important alignment edges of your report output.

And last but not least, there is the row layout strategy for bands for whenever you need to design list-style reports quickly. With a row-layout you don’t have to worry about element alignment or overlapping elements at all, as the row-layout prevents that out of the box.

Using Queries in Formulas in Pentaho Report Designer

One of the most powerful features of Pentaho Reporting is our ability to combine data from a multitude of datasources into a single report.

A report can be driven by a a query that reads an XML file. The results then parametrize a Web-Service. The web-service response then gets fed into a standard SQL query. And that finally drives the queries to a OLAP cube.

Sure, that example is a bit extreme, but possible out-of-the-box with Pentaho Reporting downloaded from SourceForge.

But we can do more: In Pentaho Reporting 3.8, you can also use two formula functions to query report datasources to incorporate the query-result into your report.

The SINGLEVALUEQUERY function retrieves a single values from the resultset. It always reads it from the first row of the result-set, either from the given column or simply taking the first column it finds.

SINGLEVALUEQUERY()
SINGLEVALUEQUERY([query:string]; [column:string])
SINGLEVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])

As you can see, both column and query-timeout are optional parameter. The “query” given must be a valid report query – but more on that later.

The MULTIVALUEQUERY function works similar to the SINGLEVALUEQUERY formula function. But instead of retrieving just one value, it returns all values as an list.

In addition to the parameters of the SINGLEVALUEQUERY formula function, it also accepts a “query-limit” Integer-parameter, that restricts the number of values returned.

MULTIVALUEQUERY([query:string])
MULTIVALUEQUERY([query:string]; [column:string])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer])
MULTIVALUEQUERY([query:string]; [column:string]; [querytimeout:integer]; [limit:integer] )

So what is that ominous “query” parameter about?

In Pentaho Reporting, most data sources use logical names (or handles) to reference queries. Queries can be quite complex. SQL and MDX queries can be huge, and seeing queries printed out that cover several pages is rather normal. They are complex beasts that were difficult to write and even more difficult to keep free of errors. You don’t want to seem them being spread over throughout the reports.

All except for the “Advanced/Custom” data-sources hide these monsters behind easy to remember names. This allows us to encapsulate the complexities of the actual queries while at the same time documenting the purpose of the query itself. It is just easier to use “Sales-Persons-By-Region” in a report than to repeat the SQL statement everywhere.

Each report holds the logical name in its query attribute. When the query needs to be executed the reporting engine checks all declared data-sources to find one that recognizes the handle and that then returns a result-set for the assigned query.

For the formula system, I followed the same approach. It is easier to maintain formulas that use

SINGLEVALUEQUERY("sales-top-performer")

than to use

SINGLEVALUEQUERY("SELECT name, Sum(Sales) AS 'Sales' FROM SalesData ORDER BY 'Sales' LIMIT 1")

especially if that formula is going to be used several times in the report.

As a side note: As with calculated queries in the report itself, you can add a Advanced/Custom data source to compute a query at runtime. At that point, the second query would be valid.

These are the steps needed to use a SINGLEVALUEQUERY or MULTIVALUEQUERY formula function in a report for a given query.

(1) Define the data-source
(2) Add a query, test that it gets your expected result back.
(3) Reference the query in the formula

How do I parametrize the SINGLEVALUEQUERY or MULTIVALUEQUERY formula function?

For moth formula functions, parametrization happens implicitly. At runtime the reporting engine passes the current data-row, including all report parameter, the values from the report’s current result-set and the calculated function/expression values to the data-factory that executes the query.

Queries need to get parametrized as usual. For instance for SQL use the ${param} syntax. The underlying data-source itself specifies the names of the data-row columns it wants to use as parameters.

Example:

Lets assume that your report contains a field called “CUSTOMERNUMBER” and you want to retrieve the accumulated sales for the user via a SINGLEVALUEQUERY function.

(1) Define the SQL datasource on your report (or reuse one you already have defined).
(2) Add the query, name it “Sales-For-Customer” and provide the following SQL:

SELECT
     SUM("ORDERDETAILS"."QUANTITYORDERED" * 
     "ORDERDETAILS"."PRICEEACH") AS "Sales"
FROM
     "ORDERS" INNER JOIN "ORDERDETAILS" ON "ORDERS"."ORDERNUMBER" = "ORDERDETAILS"."ORDERNUMBER"
     INNER JOIN "CUSTOMERS" ON "ORDERS"."CUSTOMERNUMBER" = "CUSTOMERS"."CUSTOMERNUMBER"
WHERE 
     "CUSTOMERS"."CUSTOMERNUMBER" = ${CUSTOMERNUMBER}
GROUP BY 
     "CUSTOMERS"."CUSTOMERNUMBER"     

(3) Add a label, and edit the “value” attribute-expression. Set the following formula

=SINGLEVALUEQUERY("Sales-For-Customer")

Download the sample report

As you can see, there is not much magic in using the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, but the effect they can have is powerful.

PS: Multiple queries with the same parameter values will be cached by the Report Data Cache

PRD-4.0: Lets do something with colors

Everything I have been working for some time with the report designer, I start to get a bit upset about the extremely ugly and hard to use colour chooser. The funny thing about this beast is: It contains all the elements, but in a way that just kills the fun of using it. Look at this and tell me you like it:

My problems with this colour chooser are several ones:

  • Its RGB selector is a nightmare. No visual selection, its just straight from the 80’s.
  • Its HSB selector offers RGB values, but you can’t edit them there. Argh!
  • The Swatches selection with its large shadowy buttons looks funny.
  • The Swatches section is entirely hard-coded and we cannot change the colours displayed there.
  • There is no way to enter colours in HTML notation (like #FF0088)

So now it was time to put on the maintenance hat and to do something about it.

First the swatches, with colour schemas and a bit larger buttons to make it easier to select the right colour. The code is prepared for external colour schema definitions and we could even extract all colours that have already been used in a given report.

And as second option, we now have a combined RGB/HSB/HTML colour selector.

Now it is actually easy to select exactly the colour you need.