Category Archives: Tech-Tips

How to use the Barcode element in Pentaho Reporting

Barcodes are a easy way to convey information in a machine readable way. With Pentaho Reporting, you can print a large set of barcodes in your reports. Barcodes in Pentaho Reporting are provided by the “simple-barcodes” element. And creating barcodes is as simple as the name of this element promises.

Pentaho Reporting 3.8 uses the Barbecue project as Barcode renderer and supports the following barcode types:

  • 2of5
  • 2of5-interleaved
  • Codabar
  • Code39(ext)
  • Code128(a,b,c)
  • EAN13
  • UCC-EAN128
  • ISBN
  • PostNet
  • PDF-417

With Pentaho Reporting 4.0, we also integrate the Barcode4J project (in addition to the existing Barbecue barcodes), and thus add support for the following barcode types:

  • EAN8
  • EAN128
  • UPCE
  • DataMatrix
  • RoyalMail
  • US-Postal-Service IntelligentMail

Think of a barcode element as some sort of fancy text- or number-field. A barcode element prints a single field value and generates a barcode vector image for it. To add a barcode to your report, all you need to do is drag the barcode element from the palette on the left-hand side of the window into the report canvas.

Each barcode has its range of allowed characters. The barcodes EAN, UPCA, ISBN, 2of5 and PostNet only accept numbers. The Code39, Code128 and Codabar barcodes accept a limited set of characters (the letters A-Z, 0-9 and a more or less extensive set of extra characters, all from the lower 128 characters of the ASCII-Charset). And some are free to accept nearly everything as input (PDF-417 and DataMatrix).

Usually, when you print barcodes, your data should already be in a format that can be handled by the barcode you are going to use. All barcoded data is generated by machines for other machines, and the person who generated the data usually (or hopefully) knows the accepted format already.

Like ordinary text-fields, barcode elements either receive a static text in the “value” attribute or a field-name in the “field” attribute.

If the barcode support printing a textual representation of the data, you can enable this via the “barcode-settings::show-text” attribute. This is also the place to fine tune the bar-width or height or to enable the calculation of checksums for the data given.

Styling of barcode elements

Styling the elements is easy. Like the “Sparkline Elements”, barcodes reuse many of the existing style properties.

  • text-color: The foreground colour of the barcode. This is the colour the bars are rendered with.
  • background-color: The background colour of the barcode. The barcode area will be filled with that colour prior to printing the barcode. This is handy to ensure that the barcode is printed on a white background for maximum contrast.
  • font-family, font-size, bold, italics: The text properties define the style and size of the text printed on the barcode. Not all barcodes use text.
  • scale: Whether the barcode is scaled according to the element’s size. Some barcodes do not have an intrinsic size and therefore are always scaled.
  • keep-aspect-ratio: If the barcode is scaled, this defines whether the ratio between the width and the height remains constant. For some barcodes, this is mandatory to ensure that the code can be scanned properly.

And that’s it. As said: There is no big secret to using barcodes. So happy coding!

Access Databases with Dynamic Table Names in Pentaho Reporting

Every now and then we get the question on how to create a report on database tables with a dynamic date stamped table name. Sounds weirdly cryptic? Well, instead of having a single table for all orders, these systems may have many small tables for all orders of a specific month.

I tried hard to come up with a potential reason for these things:

  • Maybe the database is so weak that it can’t handle more than a few thousand rows in a table. But both MySQL nor PostgreSQL are quite capable for some serious data storage (if used right). And even Oracle can do that – and if you have enough money to buy a Oracle license you have money to buy a server (-farm, Oracle ain’t cheap!) too.
  • Maybe they want to speed up data access to the table. After all, a SELECT * FROM TABLE takes a long time if you don’t use a WHERE clause or a proper index.
  • Maybe they have not heard of normalization.
  • Maybe they love to have separate tables so that they can remove the data of older months. A SQL DELETE command is black magic. Deleting the database file in the filesystem is quicker.

So if you are using such a system – drop me a note. I am dying to know what application creates such a data-structure.

Reporting on these database structures is not straight forward and usually considerably slower than working on a proper data model. And here is how you do it, in simple steps:

Prerequisite: The data model we use in this tutorial

Lets assume we have a database filled with sales data. For the sake of simplicity, lets assume there are several tables named in the schema of “SALES_yyyymm”. The ‘yyyy’ part of the table name represents the four digit year, the ‘mm’ represents the two-digit month.

The tables are made up of the following structure (including some sample data*):

CUSTOMERNUMBER DATE ORDERID ARTICLEID QUANTITY PRICE
0001 20091015 112 Fluffy Dogs 100 15
0001 20091015 112 Gumpy Rabbit 9 14
0001 20091015 112 Angry Birds 12 5
0002 20091024 113 Busy Bees 99 1

*) Note: Yes I know that putting strings as article ID is bad. But it safes me from outlining a second table.

Prerequisite 2: Reporting made dynamic with a date parameter

Creating a static report with a hardcoded table name is easy. Write your query and be done with it. The tricky part is to create a report that selects the right table based on the a given date. That date could be the current system date for reports returning the current month’s turnover or it could be a historic date.

If you want to make the report dynamic, you will need a parameter to feed the report’s target date into the reporting system and into the query. The table-name in the database contains a partial date value. To make the report return data for a given date, we must calculate the database table name when the report runs based on whatever date the user entered.

Standard SQL datasources do not allow you to parametrize structural elements of a query. This is a security measure to prevent an attack known as “SQL Injection“.

If you need to compute structural elements of a query, you will have to use a “JDBC (Custom)” datasource (Data -> Add Data-Source -> Advanced -> JDBC (Custom)). Whenever you use a data-source marked as (Custom), you opt out of the standard parametrization. At that point, it is your responsibility to properly validate all incoming user data and to ensure that no one can enter illegal values.

The Simple-Case: Reporting within the same table

The Custom JDBC datasource only contains connection information. The actual SQL query that is executed is computed at runtime. To define a query, put a formula or expression into the “query::name” attribute of your master- or sub-report object.

Mike Tarallo create a excellent video tutorial on how to use the Custom JDBC data sources. So instead of repeating everything here, watch the video and learn what the Custom JDBC datasource is and how to use it properly.

Simple Parametrization: Using a string parameter

So we need to declare a parameter to select the table the report should use. You want it dead simple (just for testing, of course!)? Use a plain text parameter that allows the user to enter the year and month to use.

Parameter:
 Name: Simple
 Value Type: String

and finally, to prevent SQL injections, some validation:

Post-Processing-Formula: =IF(LEN([Simple]) = 6; IF (ISNUMBER([Simple]); [Simple]; NA()); NA())

Complex Parametrization ideas

You can use a ordinary date parameter to make use of the date-picker. Note that this returns a Date object. You need to convert this date into a valid text. For that add a second hidden parameter with the following Post-Processing Formula:

=MESSAGE("{0,date,yyyyMM}";[Date])

Alternatively, assuming you use a proper database, you can extract the valid dates from your database. Just query the database metadata for the tables that are available. (I am using MySQL for the examples here. Consult your database manual on how to access the database schema.)

SELECT SUBSTRING(table_name, 5) AS "SalesDates" 
  FROM INFORMATION_SCHEMA.TABLES
 WHERE table_schema = ''
   AND table_name LIKE 'SALES_%'

You can use this information to feed a list parameter.

Create your query

I assume you have a parameter that contains a suitable string (6 characters, yyyyMM format). Lets call this parameter “DateSuffix”. On the master-report’s “query-name” attribute, create a formula with the following syntax:

="SELECT * FROM SALES_" & [DateSuffix]

At runtime, the report engine will evaluate the formula and construct the proper query string. That query string is then executed by the Custom JDBC Data-Source.

Tip: The report designer does not use queries from formulas during the design process. Read this article to learn how to get designtime column information into the Pentaho Report Designer.

Summary

To create a simple dynamic report:

  • Create a parameter that returns the suffix of your table
  • Add a Custom JDBC Datasource to your report
  • Add a formula to the query-name-attribute of your report

Creating reports that summarize a full year

Single reports for a single month – that is easy stuff. But sooner or later someone will come back to you and ask for sales data spanning the last year.

Lets skip the parameter defining details. Create a parameter that returns the year as string and call it “YearText”.

To your database, each of the monthly tables is unrelated to any other of the monthly tables. Databases are great on providing referential integrity on the data within the tables. But generally they care little about the names of the tables and particularly do not assume that table names have any higher meaning.

In SQL merging data from several tables happens via an UNION operator. Tables that are merged via UNION must have the same structure – the number and types of the columns must match.

A SQL Union takes the form of

 UNION 

Or for our example:

SELECT * FROM SALES_200910 
 UNION
SELECT * FROM SALES_200911

Each select statement in a UNION is considered separately. Any “ORDER BY” clause is applied only to the particular table. If you want to sort globally, you will have to wrap the UNION result into another SQL query via a SUB-SELECT. All proper databases support Sub-Selects, but again: For details, consult your database manual.

Here is an example on how to use a sub-select to sort globally

SELECT * FROM 
  (SELECT * FROM SALES_200910 
     UNION
   SELECT * FROM SALES_200911) 
ORDER BY ARTICLEID

Sub-selects are considerably slower than normal selects. Most databases will start to use temporary tables for the selection result and therefore will answer such queries a lot slower than normal queries. If your performance goes down a lot, consider creating a data warehouse.

A query spanning a whole year will be long and complex.

SELECT * FROM SALES_200901
     UNION
SELECT * FROM SALES_200902
     UNION
SELECT * FROM SALES_200903
     UNION
SELECT * FROM SALES_200904
     UNION
SELECT * FROM SALES_200905
     UNION
SELECT * FROM SALES_200906
     UNION
SELECT * FROM SALES_200907
     UNION
SELECT * FROM SALES_200908
     UNION
SELECT * FROM SALES_200909
     UNION
SELECT * FROM SALES_200910
     UNION
SELECT * FROM SALES_200911
     UNION
SELECT * FROM SALES_200912

The matching formula for making the query dynamic would look like:

="SELECT * FROM SALES_" & [YearText] & "01
     UNION
SELECT * FROM SALES_" & [YearText] & "02
     UNION
SELECT * FROM SALES_" & [YearText] & "03
     UNION 
..

Creating more complex reports on such a model is a pain. I would recommend to create a single SQL VIEW that contains all your sales table.

This normalizes your data model without breaking your existing programsViews are virtual database tables that make it easier to write SQL queries.

CREATE VIEW Sales AS 
  SELECT * FROM SALES_200901
    UNION
  SELECT * FROM SALES_200902
    UNION
  SELECT * FROM SALES_200903
    UNION
  ..

If your database supports materialized views, it will create a permanent temporary table that can considerably speed up your reports.

The insanely complex case: Queries spanning over a user defined period of time

So you implemented a yearly sales report. The world is bright. Really? Not long.

One unhappy morning, your boss will come to your office telling you he wants to see the sales data for the last four month. And last years first half of the year, oh, and a up to date report showing the last 90 days of sales (ex: 15th of January to 13th of April).

If you use a consolidated view as outlined above, you are safe. But for the sake of argument, assume your database administrator’s religion does not allow the creation of VIEWs**.

Creating SQL queries using a BeanShell-Script

Formulas are a great tool for relatively simple problems. But as soon as the problem is more complex or requires dynamic elements, formulas quickly become unmaintainable.

Attributes can be computed by any report expression that returns a suitable value for that attribute. For complex logic, I recommend to use one of the Scripting expressions that comes with Pentaho Reporting.

The Bean-Scripting-Framework (BSF) Expression allows you to use a programming language to compute the query string. The BSF expression comes with support for Java, JavaScript, Groovy and many more programming languages.

Here is an example that computes a query for the last 90 days:

import java.util.Calendar;
import java.util.Date;
import org.pentaho.reporting.engine.classic.core.DataRow;

    Date date = (Date) dataRow.get("Date");
    Calendar cal = Calendar.getInstance();
    cal.setTime(date);
    StringBuilder b = new StringBuilder();
    b.append("SELECT * FROM SALES_");

    int year = cal.get(Calendar.YEAR);
    int month = cal.get(Calendar.MONTH);
    b.append(String.format("%1$04d", year));
    b.append(String.format("%1$02d", month));
    b.append(" WHERE ORDERDATE  ${Date} ");
      }
    }
    return b.toString();

This results in a query string that looks like this:

SELECT * FROM SALES_201106 WHERE ORDERDATE  ${Date} 
 UNION 
SELECT * FROM SALES_201104 WHERE ORDERDATE > ${Date} 
 UNION 
SELECT * FROM SALES_201103 WHERE ORDERDATE > ${Date} 

And now: Happy reporting!

**) This happens more often than you would think.

Enrich your report with sparklines

Sparklines are a great way to convey a lot of information in a simple and readable way. Sparklines can be used to plot changes of metrics over time. Edward Tufte uses temperatures of patients in a hospital and share prices as examples.

Bart Maertens wrote about sparklines when Pentaho Reporting 3.5 came out. At that time, he had to do a lot of SQL magic to get data into them. With the addition of the MULTIVALUEQUERY formula function in Pentaho Reporting 3.8, this process got a lot easier.

Pentaho Reporting comes with three Sparkline type elements. Sparkline elements are minimalistic, they do not have any axis labels or legends.

  1. Line Sparks: A small line chart that shows the change of values over time. Line charts are good to visualize trends over time. The last data point can be highlighted for better readability.
  2. Bar Sparks: A small bar chart that allows to compare absolute values over a time scale. Bar charts should be used when you need to concentrate on the ratio between values instead of the general trends.
  3. Pie Sparks: A small pie chart that visualizes a single value in relation to a total value.


Now how do you add sparklines to a report created with the Pentaho Report Designer?

Getting the data

Except for Line and Bar charts, you will need a array of numeric values to display the data. You can construct arrays from different values of the same data row with a OpenFormula function like this:

=NORMALIZEARRAY({[Value1] | [Value2] | [Value3]})

The NORMALIZEARRAY formula function converts a internal OpenFormula array into a valid Java-Array that can be used by other components.

Alternatively, you can get a array of values from a data-source by using the “MULTIVALUEQUERY” formula function.

With a SQL datasource using the sample data you can use a query like this to get the sales numbers for each year:

SELECT
     SUM("ORDERFACT"."TOTALPRICE") AS "Value",
     "ORDERFACT"."YEAR_ID",
     "ORDERFACT"."PRODUCTCODE"
FROM
     "ORDERFACT"
WHERE 
          "ORDERFACT"."PRODUCTCODE" = ${PRODUCTCODE}
GROUP BY
     "ORDERFACT"."PRODUCTCODE",
     "ORDERFACT"."YEAR_ID"

This creates a result-set similar to this:

Value                   YEAR_ID PRODUCTCODE
-------------------------------------------
57363.100000000006 2003 S18_1749
19656.739999999998 2003 S18_2248
33451.850000000006 2003 S18_4409
12768.21 2003 S24_3969
39983.85999999999 2003 S18_2325
52395.560000000005 2003 S18_2795
11024.69 2003 S24_1937
14770.83 2003 S24_2022
38895.38 2003 S18_1342
22257.399999999998 2003 S18_1367
72913.26999999999 2003 S10_1949
40103.54 2003 S10_4962
46714.39 2003 S12_1666

You can then reference this data from a formula with the MULTIVALUEQUERY formula function.

=MULTIVALUEQUERY(“SparklineQuery”, “Value”)

Pie sparks simply require a single value in the range of 0 (zero) to 1 (one). Zero represents an empty pie (0% of the total), 0.5 a pie filled to the half (50% of the total), and 1 represents a fully filled pie (100% of the total).

Formatting the sparklines

Sparklines get formatted via the styles properties.

Bar-Sparks:

  • last-color: The color of the last bar
  • high-color: The color of the bar with the largest value.
  • text-color: The color of all other bars

Line-Sparks:

  • last-color: The color of the marker dot at the end of the spark
  • text-color: The color of the line

Pie-Spark:

  • low-color: The color of the slice if the value is below the “low-slice” threshold.
  • medium-color: The color of the slice if the value is below the “medium-slice” threshold.
  • high-color: The color of the slice if the value is below the “high-slice” threshold.
  • text-color: The color of the pie that is not filled by the slice.

Charting: Categorical Charts and XY-Charts

Reporting without charting is like zombies without the inevitable hunt for fresh brains. You can do it, yes, but it is sure not fun.

These days, charting in Pentaho Reporting is done via the “chart-element” in the Pentaho Report Designer. Drag the chart field into the canvas, double click on it to open the chart-editor, and start configuring your chart.

So far it’s all theory, lets see how charting really works.

Architecture of Charting

Charting in the Pentaho Reporting engine requires three parts. It requires (1) a data-collector to extract the charting-data from the datasources, (2) a chart-expression to produce a chart from the collected data and (3) a report element to display the resulting chart object.

The chart element that you can drag into the report is a front-end to hold the data-collector and the chart-expression and – of course – is responsible for rendering the chart once it has been produced.

Each chart-type is produced by a separate chart-expression. The chart-editor dialog selects the right chart expression for you when you click on one of the chart-type buttons on the top of the window. The visual properties of the chart are configured on the left-hand side of the chart dialog.

Depending on the chart type, you will need a suitable chart-datacollector. Data-collectors are configured in the right-hand table of the dialog. The dialog only offers collectors that can produce valid datasets for the currently selected chart type.

Chart Types

The chart types the Pentaho Report Designer supports can be grouped into three different groups:

1. Categorical Charts

A categorical chart uses a nominal scale to group data. The charting system makes no assumptions on relationships between the data. The X-Axis of such charts display labels for each data point, and all datapoints are printed in the order they arrive at the data-collector.

You can use categorical charts to display nominal data. In statistics, the nominal scale is the lowest measurement level you can use. A nominal scale, as the name implies, is simply some placing of data into categories, without any order or structure.

A example of a nominal scale is the sales regions of a company. Regions have no real relationship or natural order between each other. One region is as good as an other (from a statistical point of view).

Pentaho Reporting supports the following categorical charts:

  • Bar Chart
  • Line Chart
  • Area Chart
  • Combined Bar and Line Chart
  • Waterfall Charts
  • Radar Charts

2. XY-Charts

A XY-Chart uses an interval scale to organize the data it displays. All data items are comparable and can be sorted and all data points have a defined distance between each others.

The most common example for a interval scale in the context of reporting is the date-axis for comparing events on a time scale. (sales over the last years).

Pentaho Reporting supports the following XY-Charts:

  • Bar Chart
  • Line Chart
  • Area Chart
  • Combined Bar and Line Chart
  • Combined Area and Line Chart
  • Scatter Plot Charts
  • Bubble Charts

3. Pie and Ring charts

Pie charts can be used to compare the overall ratio of several numeric values. Your values must be complete (ie add up to 100% of what you want to show) or your chart will be misleading.

Pie charts should only be used if you are comparing the size of a slide with the overall size of the pie. However, this type of charts is not suitable for comparing the sizes of two slices or comparing different pie charts.

The Multi-Pie-Chart chart type should not be used at all. It only exists because some users insisted on this sort of functionality. But this sort of chart is hard to read at best and usually just plainly misleading.

Creating a categorical bar chart

Now, in the hands-on section of this entry, lets create a bar chart that shows the sales of our various product lines over the last few years. This example uses the sample data that comes with the Pentaho Report Designer.

Create a new report in the Report Designer and follow me.

First, You need data

Create a new JDBC datasource (Data->Add Datasource->JDBC). Select the predefined connection “SampleData(Memory)”, create a new query called “chart-data” and add the following query text.

SELECT
     PRODUCTS.PRODUCTLINE,
     ORDERFACT.YEAR_ID,
     sum(ORDERFACT.TOTALPRICE) AS SALES,
     sum(ORDERFACT.QUANTITYORDERED) AS VOLUME
FROM
     PRODUCTS INNER JOIN ORDERFACT ON PRODUCTS.PRODUCTCODE = ORDERFACT.PRODUCTCODE
GROUP BY
     PRODUCTS.PRODUCTLINE,
     ORDERFACT.YEAR_ID
ORDER BY
     PRODUCTS.PRODUCTLINE ASC,
     ORDERFACT.YEAR_ID ASC

Make this query your currently active query: In the data-tab, expand the new JDBC data-source and right-click on the query node below and select “Select Query”.

Then you need a chart object

Drag the chart-object from from left hand palette into the report-header area. This ensures that the chart is printed once on the report. If you put a chart into the Group-header or -footer the chart is printed for each occurrence of that group. Putting a chart into the details band makes no sense. It would print the exact same chart as many times as there are rows in your data source.

And finally we need to configure the chart

Double click on the chart object to start the chart-editor. The “Bar Chart” should already be selected.

First, we define what data the chart will be using. On the right-hand side you find the chart-data collector. Enter the following values into the properties:

  • Category Column: YEAR_ID
  • Value Column: SALES
  • Series-by-Field: PRODUCTLINE

A chart could have several series for each column, which can be useful if you want to compare data-series with each other.

Next define the chart’s appearance. For now, we keep it simple and just change the Chart’s title and the y-axis title.

  • Chart Title: Product Line Sales Trend
  • Y-Axis Title: Sales

Confirm the changes in the chart editor by pressing “OK” on the bottom of the dialog and preview your report.

You can find several chart examples in the samples that ship with the Pentaho Report Designer. Explore them!

Creating your own Parameter UI for the Pentaho BI-Server

Our BI-Server ships with a default GWT parameter UI for the parameter defined on a report. If you had been around for a while, then you will remember the sigh of relieve when we freed everyone from the tyranny of XActions for running simple reports. Since then the parameter capabilities of the reporting module grew and grew with every release making these parameters easier to use than the XActions’s original design.

GWT is nice for a old and grumpy Java developer like me, as I do not have to worry about JavaScript (untyped, for heaven’s sake, untyped!). But the hardcore nerds like Pedro “JavaScript is my life” Alves do not like the monolithic garbage the GWT compiler spits out. To slow, to heavy, and foremost: Not really extensible unless you recompile the beast for each change. And worst of all: I agree to these complaints.

However, there is a silver lining on the horizon. Our architecture is open, so you are able to replace the GWT code with your own magic with no problems at all. And here is how you would do it:

Basics: The parameter UI architecture

The parameter UI works as a REST service. The server receives calls and sends responses based on the parameters given, without holding any server side state elsewhere.

(1) The browser loads the report.html page and initializes the GWT parameter application (GWT from now on).

(2) GWT calls the server’s report handler with “renderMode=PARAMETER” or “renderMode=XML”. If there are values for any of the parameters known, then these values are given on the URL using ordinary HTTP-GET or POST requests.

The URL that is called is something like this:

http://demo.pentaho.com/pentaho/content/reporting?renderMode=XML&solution=steel-wheels&path=%2Freports&name=Product+Sales.prpt

(3) The server responds with the parameter XML document. If the renderMode is XML, this document also contains the number of pages in the report. The server only returns page numbers if the parameter validate correctly and if the pagination does not cause any other errors.

(If you are logged into the demo server, call the URL from step 2 in your browser to see the XML document the server returns.)

(4) GWT creates a UI for all parameters based on the Parameter XML document. All information is given as attributes on the parameter. The parameter’s possible values and current value are given in that document as well. These sets of values can change if a other parameter changes.

(5) If all parameters validated correctly (according to the Parameter XML document), it now sends a request to retrieve the rendered report. Again, this is a ordinary HTTP-GET call with all parameters attached onto the URL.

http://demo.pentaho.com/pentaho/content/reporting?renderMode=REPORT&solution=steel-wheels&path=%2Freports&name=Product+Sales.prpt

(6) The Browser displays the report content in the IFrame below the GWT parameter UI.

(7) Paging through the report jumps back to step (6) and updates the report frame.

(8) Changing a parameter value jumps back to step (2) and updates the parameter information.

The mystical Parameter XML

The parameter XML document is a description of all known parameters that the reporting plugin understands. The same format is also used by the Analyzer component and you can even get parameter information in this format out of XActions.

<?xml version="1.0" encoding="UTF-8"?>
  <parameters accepted-page="-1" autoSubmitUI="true" is-prompt-needed="false" layout="vertical" page-count="1" paginate="true" subscribe="false">
    <parameter is-list="true" is-mandatory="false" is-multi-select="false" is-strict="true" name="PROD_LINE" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="user"/>
      <attribute name="parameter-layout" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="horizontal"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="togglebutton"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Line"/>
      <attribute name="mandatory" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <values>
        <value label="Classic Cars" null="false" selected="true" type="java.lang.String" value="Classic Cars"/>
        <value label="Motorcycles" null="false" selected="false" type="java.lang.String" value="Motorcycles"/>
        <value label="Ships" null="false" selected="false" type="java.lang.String" value="Ships"/>
        <value label="Planes" null="false" selected="false" type="java.lang.String" value="Planes"/>
      </values>
    </parameter>
    <parameter is-list="true" is-mandatory="false" is-multi-select="false" is-strict="true" name="PROD_CODE" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="user"/>
      <attribute name="parameter-visible-items" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="6"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="list"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Product"/>
      <attribute name="mandatory" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <values>
        <value label="1952 Alpine Renault 1300" null="false" selected="true" type="java.lang.String" value="S10_1949"/>
        <value label="1972 Alfa Romeo GTA" null="false" selected="false" type="java.lang.String" value="S10_4757"/>
        <value label="1962 LanciaA Delta 16V" null="false" selected="false" type="java.lang.String" value="S10_4962"/>
        <!-- loads of parameter values removed -->
        <value label="1961 Chevrolet Impala" null="false" selected="false" type="java.lang.String" value="S24_4620"/>
        <value label="1982 Camaro Z28" null="false" selected="false" type="java.lang.String" value="S700_2824"/>
      </values>
    </parameter>
    <parameter is-list="true" is-mandatory="true" is-multi-select="false" is-strict="true" name="output-target" type="java.lang.String">
      <attribute name="role" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="system"/>
      <attribute name="preferred" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="true"/>
      <attribute name="parameter-group" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="parameters"/>
      <attribute name="parameter-group-label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Report Parameters"/>
      <attribute name="label" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="Output Type"/>
      <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="dropdown"/>
      <attribute name="hidden" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="false"/>
      <values>
       <value label="HTML (Paginated)" null="false" selected="true" type="java.lang.String" value="table/html;page-mode=page"/>
       <value label="HTML (Single Page)" null="false" selected="false" type="java.lang.String" value="table/html;page-mode=stream"/>
       <value label="PDF" null="false" selected="false" type="java.lang.String" value="pageable/pdf"/>
       <value label="Excel" null="false" selected="false" type="java.lang.String" value="table/excel;page-mode=flow"/>
       <value label="Excel 2007" null="false" selected="false" type="java.lang.String" value="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;page-mode=flow"/> 
       <value label="Comma Separated Value" null="false" selected="false" type="java.lang.String" value="table/csv;page-mode=stream"/> 
       <value label="Rich-Text-Format" null="false" selected="false" type="java.lang.String" value="table/rtf;page-mode=flow"/>
       <value label="Text" null="false" selected="false" type="java.lang.String" value="pageable/text"/>
     </values>
   </parameter>
   <!-- many many more parameter -->
 </parameters>

To create a usable UI, you need to look at some critical information in the parameter UI.

The parameter element itself contain the most critical information for a parameter.

<parameter is-list="true" is-mandatory="true" is-multi-select="false" is-strict="true" name="output-target" type="java.lang.String">

The parameter name gives the parameters internal name. Usually each parameter also has a “label” defined as attribute. The “label” is shown to the user, the “name” is sent to the server. The parameter element also indicates whether a parameter is a list or plain parameter. List parameter have a predefined list of values from which the user can choose from. Some list parameter allow the user to select multiple values (“is-multi-select”) or control whether the user can specify additional values (“is-strict”) or whether the parameter can be empty (“is-mandatory”). You can use this to perform client-side validity checks if you want to. But no matter what you do, the server checks the input anyway, as client side input is untrusted input. Not all parameter that are listed in the parameter document are shown to the user. Some are system level parameter that are only useful in certain situations (like all subscription parameter). A parameter is hidden if the hidden attribute is set to true.

 <attribute name="hidden" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="false"/>

How a parameter is presented to the user is defined by the “parameter-render-type”. Right now, this attribute can have the following values:

  • dropdown
  • list
  • radio
  • checkbox
  • togglebutton
  • textbox
  • datepicker
  • multi-line
 <attribute name="parameter-render-type" namespace="http://reporting.pentaho.org/namespaces/engine/parameter-attributes/core" value="dropdown"/>

Data formats for sending and receiving data

Sending parameter information over the internet can be a funny exercise. HTTP only allows to send text, so objects like dates and numbers need to be encoded properly. The Pentaho Reporting plugin expects all parameters in a locale-independent standard format. Numbers must be encoded as decimal numbers in plain english format (point as decimal separator, no thousands separators) using the format string “#0.#########”. Dates and Times must be given in the standard ISO format using the format string “yyyy-MM-dd’T’HH:mm:ss.SSS” or (if encoded with timezone information “yyyy-MM-dd’T’HH:mm:ss.SSSZZZZ”). (Also see: “Its about time – better Date parameter handling in Pentaho Report Designer“) Multi-selection parameter must be given by repeating the parameter name for each value. The order of multi-selection values is taken into account when processing the parameters on the server side. Example:

../report?multi-select=Value1&multi-select=Value2

Now all we need is a bit of JavaScript magic on the client side to replace the grumpy old GWT parameter application with a lightweight and smooth pure JavaScript/AJAX solution. Stay tuned ..

Additional Reading Material

Printing Watermarks with Pentaho Report Designer

A watermark is a image or other content that is printed in the background of your report. You can use watermarks to indicate a status of your report (Confidential, Draft or For Internal Use Only). Or you can use watermarks to print a form template into the background of your report.

A well defined watermark can transform a report from a uninspired bunch of numbers into a fully branded document that reinforces your companies image in all receivers of the report.

So how do you define a Watermark in Pentaho Reporting?

Activating the watermark

When you create a new report, the watermark section is hidden. Before you can create new elements on this section, you will have to show it in the Pentaho Report Designer.

  1. Switch to the Structure Tree
  2. Locate the Watermark node under the master report node.
  3. Select the Watermark node and switch the properties table to the “Attribute” view
  4. Set the “hide-on-canvas” attribute to “false”

At the bottom of your design area in the report designer window, you will now see the watermark section below the page-footer section.

Properties of a watermark

The watermark section is printed behind all other content. Watermarks are special content that is only useful for printing and the PDF export. Plain-Text exports and all table-exports do not allow overlapping content. A watermark would prevent the printing of other content on that page. Therefore watermarks are disabled for these export types. Even if your watermark section contains content, the Pentaho Reporting Engine would not process any of it.

A watermark always spans the complete width and height of a page and content printed on the watermark does not affect any other content on the page.

Defining content for watermark sections

Watermark section is a ordinary band. You can drag and drop fields, labels or images into the watermark just as you would do for any other of the bands. The watermark is printed when a new page is started and fields print the same values as fields defined in the page header. Like all page-header or -footer bands, watermarks cannot contain any subreports.

The Pentaho Report Designer ships with the “Production Reports/Invoice” sample report that makes use of a watermark to show a background image to reinforce the branding of the “Steelwheels” toy company and prints a stamp text on the report.

Integrating PHP/ASP/Ruby-Web-Applications with Pentaho Reporting

From time to time we get the same question on the Pentaho Reporting Forum:
“How do I integrate the Pentaho Bi-Server and reporting capabilities with my own (PHP/ASP/JSP/Ruby/whatever-) web-application?” Thrilled by the BI-Server’s blazing features, many users want to enrich their applications with rich reporting and analysis capabilities, and unlike the dinosaurs of the BI-Market, our Open-Source solution makes this easy and cost effective.

The techniques I am going to describe in this blog-post are no magic and can be applied to other web-application (like our Data-Integration server) as well.

First, you need to get a Pentaho system up and running. For now, and for the sake of easiness, lets stick with the standard Pre-Configured Installation (PCI). The Pentaho PCI is a Apache Tomcat server that runs the Pentaho BI-Server as its only web-application.

Installing the Pentaho BI-Server

  1. Download Pentaho
  2. Unpack the ZIP or tar.gz archive into a directory. On Linux or Unix systems, I would recommend something like “/opt” or “/usr/local/” as target. On Windows .. well, in that case you are just experimenting and not planing for production, I assume. So any folder will do.
  3. Setup a publish password

    You need that later for publishing reports or other files to the server. This is the major obstacle for most new users when trying to get their own content up and running.

  4. Edit the file “bi-server-ce/tomcat/webapps/pentaho/WEB-INF/web.xml”

    Change the parameter “fully-qualified-server-url” to the public URL of your server.

    
    
      fully-qualified-server-url
      http://localhost:8080/pentaho/
    
    
  5. Optional: If some other tomcat installation runs on your system, your port 8080 may be in use already.

    You can change the port in the file “bi-server-ce/tomcat/conf/server.xml”:

  6. Start the Hypersonic database
  7. Start the Pentaho BI-server
  8. Optional: Install the BI-Server as a system service.

You should now be able to direct your web-browser to the BI-Server and should see the login screen when you type http://localhost:8080/ into your browser’s address bar.

Integration by Direct Linking

Integrating a BI-Server into a existing PHP application is easy now. All you need is a running BI-Server instance that is accessible by your users. From now on, I will simply use the Pentaho Demo Server for our little integration project.

http://demo.pentaho.com

The Pentaho Demo Server is a slightly tuned down installation of the Pentaho BI-Server. Anything I am going to show will work in the same way with the standard community edition.

When refering to the two applications, I will use “application” for the PHP, ASP or Ruby application that wants to utilize the Pentaho Platform. I will use the term “Pentaho” for the Pentaho BI-Server.

The most primitive way of connection two web applications together is to make the application include Pentaho in an frame or by opening up the specific report or analysis view in a new window.


  
    My Web Application
  
  
    

    This file could have been generated by any server side program. Use your phantasy here.    

   

    Reports can be parametrized by adding the parameters for the report to the URL. The parameters must be properly URL encoded. Use UTF-8 as character encoding.    

   

    This shows how to open up a report in a new Window:          A report    

     

Inside the Pentaho Platform, each report has its own page that can be accessed independently of the Pentaho User Console. The sample above links to a report on the system, using the predefined username and password. Note that the username and password is given in cleartext. To make this more secure you will need to set up a Single-Sign-On solution.

Pentaho Enterprise Edition comes with support for setting up Single-Sign-On for CAS rather easily via some setup scripts that take care of the ugly work. If you are using community edition and are willing to spend the time, you can always set it up yourself.

Alternatively you can set up Pentaho to work with Active Directory or LDAP instead.

Here are some sites with documentation to get you started.

And don’t forget: The Pentaho Wiki also contains a load of information.

Internally, Pentaho uses Spring Security, so with enough time and patience you should be able to connect Pentaho to anything on this planet for authentication purposes.

Align Text in Number Fields and other Formatting Basics

Today lets talk about you can format report elements and what options our Pentaho Report Designer offers you.

Anatomy of Elements

Elements can be grouped into two classes. Text elements transform data into a textual representation. Labels, String-Fields, Message-Fields and Number- and Date-Fields are examples of these fields. Graphical elements produce a image to display the data. Charts, Sparklines or Image-Fields are examples for this class of elements.

A report element in Pentaho Reporting consists of style information and attributes. Most styles and attributes can be provided either statically or can be calculated by a function. The static properties are called “design time properties”. The calculated properties are called “runtime properties”.

Attributes control the element’s behaviour and how data is processed for displaying in the element. Examples for this are format-strings, rich-text processing and so on. Style properties control the visual appearance of the data printed.

Basic Formatting

With the Pentaho Report Designer you will find many formatting options that you can find in other text processor or graphical programs.

All elements share some common properties:

  • visible: Defines whether a element is shown on the final print out. Visibility is mostly used as runtime property instead of being specified at design time.
  • min-width, min-height: The width and height of an element define how much space a element takes on the paper. It also controls how many lines of text can be printed and how many characters fit onto each line. The size is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • x, y: The position of the element if placed into a canvas element. The position is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • dynamic-height: Dynamic height is a flag you can set to let the element expand its height to match the content you attempt to print.
  • various border properties: Allows you to draw a border around the element. You can define the border of the currently selected element very easily by using the menu option “Format->Border..”
  • padding: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element. The padding size is given in points (1/72th of inches).
  • text-color, background-color: Defines the element’s foreground and background colours.

Text elements

For text elements, the most common style you are going to change is probably the font properties.

  • font-name: Defines the font name. The font must be available, or a default font is used instead. This is especially important if you intend to publish the report to a server, which may not have the same fonts as you installed.
  • font-size: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element.
  • bold, italics, underline, strike-through: Various flags controlling the appearance of the text
  • embedded: This flag is only useful for PDF exports. It controls whether the font is embedded into the PDF document. Only TTF-fonts can be embedded and not all fonts allow you to embed them.
  • h-align, v-align: Positions the text within the element.

Graphical Elements

Graphical elements have different properties based on what sort of element you are dealing with.

  • scale, aspect-ratio: Defines whether the shape will scale up or down to fit the element’s bounding box. If “keep-aspect-ratio”
  • fill-element: Defines whether the shape will be filled. This has no effect on lines. Elements will be filled with the colour defined in fill-color
  • draw-outline: Defines whether the shape outline will be drawn. The outline will be drawn with the colour defined in text-color
  • stroke
  • : Defines the stroke that is used to draw the outline of the shape.

Attributes

Most of the attributes for elements are advanced properties and not required to actually get a basic report up and running. So I will concentrate on the ones that really matter:

  • value: The static value that should be printed. If this is defined,
  • field: The name of the field from the reporting engine reads the value. If a value is given in the “value” attribute, that static value will be used and the field will be ignored.
  • if-null: A static value that is printed if the field would evaluate to otherwise.
  • data-format: A flag that defines whether the element will update its format string from the meta-data given by the data source.
  • style-format: A flag that defines whether the element will update its style properties from the meta-data given by the data source.
  • show-changes: If this flag is set to true, the element will only print if the value has changed from the previous value. The element will always print if it is the first element in the group or on the page.

Message-, Date- and Number-Fields

  • format: Defines the format string. To make number fields align correctly, ensure that your format specifies a fixed set of decimals in the format string and that you right-align the field in the style options.

Armed with this knowledge, formatting elements in a report should no longer be a daunting task. “Be fruitful and multiply your reports, and fill the sheets and subdue it; and have dominion over your data and over the numbers of the accounting and over every piece of information that moves in your company.”

Pentaho Report Designer – Essential Readings to get started

When you download the Pentaho software for the very first time, the bunch of cryptic applications can be daunting. Reading about all the new and complex concepts of OLAP-cubes, data-warehouses, ETL processes to fill the data-warehouses or metadata layers is daunting at best.

If all you want to do is start with a report to see your own data printed, then at first, you can ignore the cryptic stuff and stick to the bread-and-butter parts: Using the Pentaho Report Designer to create your first report. Lets find out how.

First, you need to make one critical decision: Is what you want a classic report that is primarily meant to be printed (or used as if being printed)? Or do you want to explore your data to discover new knowledge.

Reading: Learn about the two kinds of reports.

If you are seeking an analytic tool for exploring your data, then you have no other option than to setup a OLAP-Cube to use the Pentaho Analyzer. These days, for simple cases you can get quick results with Pentaho’s Agile BI Initiative, so check them out.

From now on, I assume you are more interested in conventional reports with the sturdiness and durability of real paper. For over 500 years, since the invention of the double-entry bookkeeping, business reporting played a crucial rule in the business world. Reports are made to last.

To get started with the Pentaho Report Designer, you do not need any special setup or a data-warehouse as Pentaho Reporting can work with most relational databases after installing the right JDBC driver. You should get these drivers from your database vendor.

The Pentaho Report Designer already comes with drivers for MySQL, Hypersonic and H2 database systems. You can install your own drivers by copying the driver jar into the “lib/jdbc” directory inside the report designer.

Reading: Follow the walk-through to learn how to create your first report

This tutorial uses the sample database that we provide inside the report designer. This step-by-step guide will show you how to setup a database connection, how to get data into your report and how to layout elements to make a visually appealing result. When you use the SQL datasources you need a basic understanding of SQL. If you are not familiar with SQL, you can learn the basics of SQL in this rather good tutorial. Your database manual should also contain a introduction to SQL.

Work through the tutorial and you are ready to make your first steps in the world of Pentaho’s Business Intelligence tools.

There are some articles in this blog that help you further:

Access session and environment variables in a PRPT

Reports need to interact with the outside world in many ways. To formalize this, we allow to pass values from the outside to the inside. Pentaho Reporting knows several classes of inputs:

  • Environment variables
  • Parameter: User supplied values usually coming from an interaction of the user with a user interface
  • DataRow-Fields: part of the result-sets from the data-sources
  • Expressions and Functions: Calculated values

Today, we will talk about environment variables. Environment variables are system level parameter. You can use them to configure reports based on where you deploy the report.

We provide several properties for use within the Pentaho BI-Server.

  • serverBaseURL: the server URL, for example: “http://localhost:8080”
  • pentahoBaseURL: the location of the Pentaho Web-application: “http://localhost:8080/pentaho”
  • solutionRoot: The file-system location where the local copy of the solution repository can be found.
  • requestContextPath: the servlet’s request context path pointing to the local web-application: “/pentaho/”
  • username: The current user’s name
  • roles: The roles as CSV-encoded list. The CSV string is quoted according to the Excel CSV quoting rules.

Inside the BI-Server, environment variables can also be used to interact with the server’s environment to query session and global parameter information. The report environment allows you to access values stored on the current user’s session or as global parameter by using a special prefix.

Requesting a environment property “session:name” will query the session context for an entry called “name” and returns the string representation of this object to the report.

Likewise, prefixing a environment property with “global:” will query the global parameter collection of the BI-Server. Therefore querying a environment variable “global:name” will look for a entry “name” in the Bi-Server’s global parameters.

Accessing Environment variables with Formulas

The Pentaho Reporting Engine offers the formula function “ENV” to access the report environment properties.

Definition:

=ENV(String)

To access the variable called “example” as plain text, use

=ENV("example")

To access a value bound to the server-side session variable “example” use

=ENV("session:example")

If the variable is a CSV encoded string, you can use the CSVARRAY function to parse the string into a array of strings. This array is then suitable to use for multi-selection parameters or SQL-IN clauses.

=CSVARRAY(ENV("roles"))

You can now use these variables as part of other formulas, for instance to compute URLs pointing to documents on the server server.

=ENV("serverBaseURL") & "/myotherwebapp/downloads/pentaho-report-designer.zip"

Accessing Environment variables as predefined fields

When you open up the Pentaho Report Designer, you will see several of the well-known environment variables listed as fields on the report itself. Using the ENV function is flexible, but not very convenient. So we added a automatic mapping to the reporting engine.

The various env:* fields you see in a report are what I call “well-known” fields. As Pentaho Engineers we know knows that they exist on the server and are heavily used by our users. So we added a mapping to the system so that they automatically become fields without having to use a formula function to read their values.

The list of “well-known” fields is defined in the global report configuration. The global configuration can be edited by creating or editing a file called “classic-engine.properties” inside your Pentaho Report Designer and Pentaho BI-Server directories.

Inside the Pentaho Report Designer, this file resides in the “resources” directory in your Pentaho Report Designer installation directory. For the BI-Server you can find this file in the “pentaho/WEB-INF/classes” directory. To make your reports work consistently inside the Report Designer and the Server, make sure that both files have matching definitions.

For example: If you want to map your environment-property “value” into the field “env::value” then you would add the following line to your “classic-engine.properties” file:

org.pentaho.reporting.engine.classic.core.env-mapping.value=env\:\:value

In case value contains a comma separated list (using the Excel-CSV quoting rules for strings that need to be quoted), then you can make that list available as array by appending -array to its name.

org.pentaho.reporting.engine.classic.core.env-mapping.value-array=env\:\:value-array

The engine will recognize the “-array” suffix and will look for a “value” property instead, parses that into a string-array and returns you the array. The array then can be used in a SQL-IN clause.

The prefix “env::” is a convention inside the reporting engine to separate environment fields from other fields. If you omit this, you have to make sure that none of the fields returned by your data sources and none of your expressions or functions provide a value under the same name. This would overshadow your environment property and renders it inaccessible.