Category Archives: Basic Topic

Anonymized reports – Report bugs without exposing your business

When reporting a tricky bug, it is mandatory for us to get a sample of a report that shows the bad behaviour. The worst bugs are bugs that consistently show up at the customer’s system but refuse to be seen on my own computer. The best bug-reports are reports that contain a small sample report along with the data necessary to show the bug’s effects.

However, not everyone is willing or allowed to share sensitive data. If the bug occurs in your HR reports, its probably not the best idea to attach the report and data to a public JIRA case. When you live in the EU, disclosing personal data of non-consenting persons is a rather serious act.

With Pentaho Reporting 4.0 creating good bug-reports finally becomes easier.

Select “Extras->Anonymize Report” and your report’s text goes through a randomization process. All characters in the labels are replaced by a randomly chosen character, while preserving both word length, punctuation and capitalization.

The select the report’s query and select “Extras->Anonymize Query” and your data undergoes the same process. Numbers stay numbers, but are replaced by randomly chosen numbers of the same magnitude. Text and dates are scrambled too. Once this is finished, remove your old query from your data-source and your report now uses the new query.

Note that the query-anonymization works on the preview data. If your query has parameters the preview usually does not return data without proper parameter values. In that case you need to rewrite your query to remove the parameters before you can anonymize it.

With this selective process we preserve most of the characteristics of the report that are important for the layout calculation, but remove most of the sensitive data that was contained in the report.

Linking to a report with the same export type

When you create reports connected with each other by links, you want to stay in the same output mode as your source report. When viewing a PDF report, you want the linked report to be PDF too, for instance.

So how would you do that in Pentaho Reporting?

(1) You need to know your current export type.

When you export a report, each output type has a unique identifier similar to this one “pagable/pdf”. The identifier consists of

(a) the basic export type:
  * pagable for paginated reports or
  * table for reports exported as layout-tables
and
(b) the content type
  * pdf for PDF export
  * html for HTML
  * .. and so on.

The BI-Server uses the same identifiers in the reporting plugin to select the correct output target for your reports. The parameter for this is called “output-target” and is documented in the Pentaho Wiki.

You probably know about the “ISEXPORTTYPE” function. This formula function allows you to test for a specific output target when the report runs. To get the export type you now could write a ugly long formula with many nested IF functions.

Or you can use this small BSH-Function instead:

Object getValue()
{
  return runtime.getExportDescriptor();
}

to get the export descriptor string directly.

(2) You need to feed this export identifier into your links.

Use the Drill-Linking functionality to add a manual parameter to your link. Name this parameter “output-target” and link this to your BSH-Function. If your function is named “ExportType”, then you would write “=[ExportType]” into the value field.

 
With that connection made, your reports will now be linked with the same output-target as the source report. Set the “Hide Parameter UI” option if you want to link against the target file without having to manually submit parameter.

Doing the performance dance (again)

I just changed another bit of the table-export while integrating a patch for PRD-3631. Although the patch itself did take a few illegal shortcuts, it showed me a easier way of calculating the cell-backgrounds for the HTML, Excel and RTF exports of Pentaho Reporting.

After a bit more digging, I also fixed some redundant calls in the HTML and Excel exports for merged cells and row-styles. Both resulted in repeated calls to the cell-background calculator and were responsible for slowing down the reporting engine more than necessary.

The performance of my test reports improved a bit with those changes. But if any, then this case has shown me that clean report design is the major driver of a fast export.

The performance for the reports went up by 15 to 30 percent, with the larger changes on the reports with larger row-counts. However, the reports I test are surely non-representive, as there all elements are perfectly aligned and the report is designed to avoid merged cells.

The patch specifically claims to address performance problems in the cell-style calculation. Agreed, there were problems, and the patch addressed them. But there was no way I could see a 100% improvement on normal reports. Well, not reports that are well-designed and use the powerful little helpers that the Pentaho Report Designer offers to make reports well-aligned.

When I receive production reports for debugging, the picture is usually more bleak. Fields are place rather randomly, and usually misaligned by a few points. They start and end on rather random positions, and usually elements are not aligned to element boundaries across different sections.

Let’s take this visually fairly report as an example:

The many fine grey lines you can see mark element boundaries. The more lines you see, the more cells your report will have. Each cell not only means larger HTML files, it also means more processing time spent on computing cell-styles and cell-contents. Thick grey lines spanning across the whole section usually indicate elements that are off by less than one pixel.

These lines are produced by the Report Designer’s “View->Element Alignment Hints” feature. When this menu item is selected, you will get a better idea on how your report will look when exported into a table-export. If you cannot see the details clearly, zoom in. The Report designer happily magnifies the working area for you.

When exported to HTML, this report here created a whopping 35 columns
with another 35 rows. That is potentially 1225 cells. The resulting
HTML file has a size of 21,438 bytes. For a report with just a few items of text, this is a lot.

In general you’ll want to avoid having to many of these boundaries. In the basic design courses, teachers tell fairly early on that layout where element edges are aligned look cleaner and more pleasing for the eye. When you look at adverts or magazines, you can see this on how articles and images seem to sit along visual boundaries or dividing lines. For a well-designed report this is no different.

To help you design your reports in a well-designed fashion, the report designer comes with the “View->Snap to Elements” feature.

To clean up a report, I usually start by aligning elements that are sitting close together. Visually, it makes no difference whether a element starts at x=24 or x=24.548. For the reporting engine, this makes a difference, as a dumb little engine cannot decide whether the user was just lazy or had a very good reason to have a cell at exactly these positions (or whether some visual design would break by attempting to blindly fix it). 

With the “Snap To Elements” enabled, just select one element and drag the mis-aligned edge until it snaps off its current position. Then move it back into position. This time it will snap to one of the other elements. If your edges are very close, I drag the current edge towards the top (for the y-axis) or the left (x-axis) until it leaves the crowded area. When I return with it, it will snap to the first (top-most or left-most) edge in the group of elements by default. Repeat that with all elements that sit near the edge and very soon you should only see one thin line indicating a perfect alignment.

Additionally, you can also change the element’s position and size to a integer number in the “style” table on the right-hand side of the Report Designer. When you do that for all elements, your alignment task will become a lot easier. Now elements are either aligned or at least one point apart (and the mis-alignment is easier to spot).

The quickly cleaned up version of my sample report now has only 24 columns and 16 rows, but visually, you cannot tell the difference between the two of them. Theoretically, the resulting table can have 384 cells, compared to the mis-aligned report a reduction to a quarter of the original 1225 cells. And finally, the generated HTML file shrunk to a size of mere 8,853 bytes, one third of the original size. In my experience and with those numbers in mind the computing time for this optimized report should be roughly 10% to 15% better than the optimized version. In addition to that slight boost, your report will download faster and rendering the report in the browser will be a lot quicker as well.  

So remember, performance optimization starts in your report designer: When you optimize your report designs it instantly pays off with quicker rendering and smaller downloads.

Further optimization

That report uses line-elements to draw borders around the statistic sections. By using sub-bands with border definitions, that report could be simplified further, but I’ll leave that as an exercise for the class.

PRD-3553 – or the low performance of large reports

The funny thing about most bugs is, that for most parts they go completely unnoticed. No one ever hits them or if they do they think its just normal weird behaviour. (Thanks, Crystal Reports & Co for training your former (and our current) users!)

One of these bugs I had in my bucket for the last week was a rather nasty error condition on reports with a large number of rows. Reports like that are usually used either for data exports or for performance tests, as a human user barely reads the first sentences of an e-mail, not to speak of hundreds of pages of dull numbers.

What was the bug?

Reports with large numbers of rows were incredible slow when run as Excel/HTML/RTF/Table-CSV or Table-XML export. The slow down was worse the more rows were in the report.

The original e-mail contained a test report that demonstrated the issue. (Note: Providing me with a replication path tremendously increases your chances of getting a bug-fix fast. You help me, and I’ll help you.) The tests they ran on their machine showed a clearly exponential curve:

# of Rows Time in seconds
5000 16
10000 28
20000 90
50000 360

What caused the bug?


Short version: The iterative processing in the reporting engine was broken and resulted in a increasingly large layout tree. Iterating this tree to calculate the layout gets more and more expensive the larger the tree gets.

Long version:
The Pentaho reporting engine uses an iterative layouting engine to process the report. During the layouting stage we build up a layouting-DOM tree containing the various boxes and content that makes up a report. Our model is heavily inspired by the CSS specification and is optimized towards using as little memory as possible. Boxes get added to the layouter by the reporting (data processing) engine and once the layout reaches a stable intermediate state, the content is printed and the processed boxes get removed from the layout-DOM. This results in a sliding zone of activity on the layout model and allows us to process huge reports with minimal memory footprint. And we don’t even have to swap to disk for this – all processing happens in memory.

To make this model work, we have to track which layout nodes that have been processed already and which nodes could still move around in the final document.

The whole process of tagging finished nodes worked nicely for paginated reports, but failed for flow- and stream-layouted reports. The system never tagged nodes as ready for print, and so the layout tree grew bigger and bigger. At the end, this resulted in larger processing times (as the iteration of the tree took longer) and a huge memory foot-print (as more nodes have to be held in memory).

To fix this problem, I had to fix the “ready-for-process” tagging [SVN].

Once the iterative process sprung to life, the memory foot-print went down, but the processing performance was not as good as I would have expected. In fact, the iterative processing worked so well, that it caused more overhead than it actually saved time. A quick and dirty throttling of the layouter’s processing queue made performance jump up. We now only process every 50th iterative event, and thus trading a bit more memory costs for a huge increase in processing speed.

How do we perform now

The bug fix was a absolute success beyond my wildest dreams. I can say I have witnessed an increase of 800% in report processing speed. (Ok, it is not difficult for this class of bugs: all you need is patience and a large enough report.)

# of Rows Time in seconds
Previous to the fix After the fix Change in %
5000 11 11 0%
10000 24 22 9%
20000 414 47 880%
50000 (crash) 146 (a rather large number, trust me)

When can I get this fix?

The fix will be included in the 3.8.2-GA release and should be built within the next weeks. At the moment, everyone at Pentaho is still busy finalizing the BI-Server 4.0/PRD-3.8.1 release, so it will take a moment before we can fire up another release.

In the meantime, you can either grab the sources from the Subversion repository or grab the CI-build. The 3.8.2 engine is a direct replacement of the 3.8.1 build, so you can patch your system by just copying the final jar-files over. I have not tested whether you can do some similar patching with 3.8 or 3.7 installations.

Don’t hardcode host names, use JNDI

I got a e-mail earlier this week, asking for some help in migrating reports.

Hey Thomas,

I have a poop load of reports with hard coded 
ip addresses (yeah I know – but they didn't 
have DNS entries until now) that I'd like to 
globally change.  I'm thinking of something 
along the lines of "zcat | sed | gzip" but 
something like this would only work on 
compressed files, not an archive of files.  

So – I was wondering if you had any ideas, 
maybe even a utility based on code we already 
have to create and open these bad boys.  
We could even have some known locations to 
modify 

Like>  prpt-replace –dbConnectionHost "192.168.1.100" "mydb.mycorporation.com"

Or maybe use file and xpath to denote the 
attribute to change.

(The names involved were changed to protect the guilty.)

Well, hard-coding database connection information in every report is never a good idea in no environment. If your database host or credentials changes you are truly and well .. in trouble.

So how can you avoid this trouble in the first place?

Use JDNI to hold database connections in a external place

JDNI is a technique to hold database connections outside of the report definition. By holding this information in a central spot, you only have to change one place and all your reports get updated. The JNDI stored connection information is stored on the computer where your report runs. This also allows you to have separate definitions based on where the report runs. Your report designer can access a local copy of your production database, while your production server uses the real database server.

As a side note: On the BI-Server, this system of abstracting database information is called “Pentaho Datasources”, not JNDI. In this posting, I will forgo the branding exercise and simply continue to call it JNDI.

You can use JNDI in the following datasources:

  • SQL
  • Pentaho Analytics (Mondrian)
  • OLAP4J
  • Pentaho Metadata

For all of your production reports, you should always choose JNDI defined datasources. Manually defined datasources are great for testing out ideas, but really a nightmare to maintain and by storing the database credentials they potentially pose a security risk if you choose to share report definitions with others.

JDBC Connection definition basics

A database connection in Java requires four mandatory pieces of information:

  • driver: The name of the driver JDBC class.
  • url: The JDBC URL. This URL contains all information needed to connect to the database like the host name and port. The format of this URL is specific to a particular database. Consult your database manual for details on your particular database.
  • user: The username for the database.
  • password: The password for the accessing the database.

In addition to this, each driver allows you to specify additional properties. Again, these properties are specific to the database and (hopefully) documented in the manual. Most databases alternatively allow you to encode the properties onto the URL.

Defining data-sources in the Pentaho Report Designer

Within the report designer, there is no user interface to define the JNDI connections. The connections are defined in a file called "$HOME/.pentaho/simple-jndi/default.properties".

The file looks somewhat like this:

# Copyright 2008 Pentaho Corporation.  All rights reserved. 
# This software was developed by Pentaho Corporation and is provided under the terms 
# of the Mozilla Public License, Version 1.1, or any later version. You may not use 
# this file except in compliance with the license. If you need a copy of the license, 
# please go to http://www.mozilla.org/MPL/MPL-1.1.txt. The Original Code is the Pentaho 
# BI Platform.  The Initial Developer is Pentaho Corporation.
#
# Software distributed under the Mozilla Public License is distributed on an "AS IS" 
# basis, WITHOUT WARRANTY OF ANY KIND, either express or  implied. Please refer to 
# the license for the specific language governing your rights and limitations.
SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/user=pentaho_user
SampleData/password=password
SampleData/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata

SampleDataAdmin/type=javax.sql.DataSource
SampleDataAdmin/driver=org.hsqldb.jdbcDriver
SampleDataAdmin/user=pentaho_admin
SampleDataAdmin/password=password
SampleDataAdmin/url=jdbc:hsqldb:mem:SampleData

# NOTE: If you would like to access the running HSQLDB server, comment
#       out the above line and uncomment the following line 
# SampleDataAdmin/url=jdbc:hsqldb:hsql://localhost/sampledata

The JDNI definition file holds all connections known to the local JNDI system. This system has no connection to the BI-Server and does not access any connections defined in the Pentaho Administration Console.

A JNDI database definition in the report designer is defined by a set of properties. Each property name has two parts: The name of the data-source (‘SampleData’ and ‘SampleDataAdmin’ in the example above) and the JDNI property name itself. Each datasource declared in the “simple-jndi” definition file requires the following properties:

  • Mandatory “type” entry set to the value “javax.sql.DataSource”
  • Driver class name
  • Connection URL
  • A username and password
  • .. and any additional property you want to pass in

The simplest way of creating a new definition entry is to copy an existing connection entry and to alter the name of the entry (the first part of the properties) and then to replace the connection information with your own details.

All JNDI connection names are case sensitive, so inside the report you have to use the same spelling for the name as in this properties-file.

And last but not least: Defining your connection information in the JNDI system does not relieve you from providing suitable database drivers. You can install new drivers by copying the jar files (and if needed any native libraries) into the “lib/jndi” directory of the Pentaho Report Designer.

Defining a JDNI connection in the Pentaho Administration Console

For the Pentaho BI-Server, the Pentaho architects chose not to use the JDNI subsystem that ships with all J2EE application servers. Sadly on the application servers JNDI has a few implementation flaws:

  • Redefining a JNDI connection requires you to redeploy or restart the application. This is a nightmare for a mission critical server.
  • Each server system comes with its own way to define JNDI definitions. Each of these ways is inherently incompatible with any other vendor.
  • You are bound to the implementations provided by the server vendor. There is little hope to customize the database access for additional capabilities like security checks or multi-tenancy.

Before you can setup a connection, you have to install the correct JDBC driver into both the BI-Server and the Pentaho Administration Console. For that copy the jar files into the following directories:

  • Pentaho BI Server: pentaho/tomcat/webapps/pentaho/WEB-INF/lib
  • Pentaho Administration Console: administration-console/jdbc

To define the connection, start the Pentaho Administration Console. The console starts up on port 8099, and can be reached via the URL “http://localhost:8099/”. The user and password are “admin” and “password” (sadly not documented anywhere in the download). The console can only be run successfully if the BI-Server is up and running as well.

To define a datasource, activate the “Administration” mode on the left hand side of the window and switch to the “DataSources” tab. Hit the small white “+” to add a connection.

The Pentaho Administration Console is smart enough to detect all installed drivers. If your driver is not listed here, you probably have forgotten to add it to the “lib” directories outlined above.

The data-source names you define here must match the datasource names defined for the Pentaho Report Designer. I also heavily recommend that you use the same type of database system for designing reports and for running them in production. SQL is a widely interpretable standard and therefore SQL written for one database is not necessarily compatible with the SQL dialect understood by the others.

Let me close with: Death to all hard-coded settings, long lives the host provided setting!

How to use the Barcode element in Pentaho Reporting

Barcodes are a easy way to convey information in a machine readable way. With Pentaho Reporting, you can print a large set of barcodes in your reports. Barcodes in Pentaho Reporting are provided by the “simple-barcodes” element. And creating barcodes is as simple as the name of this element promises.

Pentaho Reporting 3.8 uses the Barbecue project as Barcode renderer and supports the following barcode types:

  • 2of5
  • 2of5-interleaved
  • Codabar
  • Code39(ext)
  • Code128(a,b,c)
  • EAN13
  • UCC-EAN128
  • ISBN
  • PostNet
  • PDF-417

With Pentaho Reporting 4.0, we also integrate the Barcode4J project (in addition to the existing Barbecue barcodes), and thus add support for the following barcode types:

  • EAN8
  • EAN128
  • UPCE
  • DataMatrix
  • RoyalMail
  • US-Postal-Service IntelligentMail

Think of a barcode element as some sort of fancy text- or number-field. A barcode element prints a single field value and generates a barcode vector image for it. To add a barcode to your report, all you need to do is drag the barcode element from the palette on the left-hand side of the window into the report canvas.

Each barcode has its range of allowed characters. The barcodes EAN, UPCA, ISBN, 2of5 and PostNet only accept numbers. The Code39, Code128 and Codabar barcodes accept a limited set of characters (the letters A-Z, 0-9 and a more or less extensive set of extra characters, all from the lower 128 characters of the ASCII-Charset). And some are free to accept nearly everything as input (PDF-417 and DataMatrix).

Usually, when you print barcodes, your data should already be in a format that can be handled by the barcode you are going to use. All barcoded data is generated by machines for other machines, and the person who generated the data usually (or hopefully) knows the accepted format already.

Like ordinary text-fields, barcode elements either receive a static text in the “value” attribute or a field-name in the “field” attribute.

If the barcode support printing a textual representation of the data, you can enable this via the “barcode-settings::show-text” attribute. This is also the place to fine tune the bar-width or height or to enable the calculation of checksums for the data given.

Styling of barcode elements

Styling the elements is easy. Like the “Sparkline Elements”, barcodes reuse many of the existing style properties.

  • text-color: The foreground colour of the barcode. This is the colour the bars are rendered with.
  • background-color: The background colour of the barcode. The barcode area will be filled with that colour prior to printing the barcode. This is handy to ensure that the barcode is printed on a white background for maximum contrast.
  • font-family, font-size, bold, italics: The text properties define the style and size of the text printed on the barcode. Not all barcodes use text.
  • scale: Whether the barcode is scaled according to the element’s size. Some barcodes do not have an intrinsic size and therefore are always scaled.
  • keep-aspect-ratio: If the barcode is scaled, this defines whether the ratio between the width and the height remains constant. For some barcodes, this is mandatory to ensure that the code can be scanned properly.

And that’s it. As said: There is no big secret to using barcodes. So happy coding!

Enrich your report with sparklines

Sparklines are a great way to convey a lot of information in a simple and readable way. Sparklines can be used to plot changes of metrics over time. Edward Tufte uses temperatures of patients in a hospital and share prices as examples.

Bart Maertens wrote about sparklines when Pentaho Reporting 3.5 came out. At that time, he had to do a lot of SQL magic to get data into them. With the addition of the MULTIVALUEQUERY formula function in Pentaho Reporting 3.8, this process got a lot easier.

Pentaho Reporting comes with three Sparkline type elements. Sparkline elements are minimalistic, they do not have any axis labels or legends.

  1. Line Sparks: A small line chart that shows the change of values over time. Line charts are good to visualize trends over time. The last data point can be highlighted for better readability.
  2. Bar Sparks: A small bar chart that allows to compare absolute values over a time scale. Bar charts should be used when you need to concentrate on the ratio between values instead of the general trends.
  3. Pie Sparks: A small pie chart that visualizes a single value in relation to a total value.


Now how do you add sparklines to a report created with the Pentaho Report Designer?

Getting the data

Except for Line and Bar charts, you will need a array of numeric values to display the data. You can construct arrays from different values of the same data row with a OpenFormula function like this:

=NORMALIZEARRAY({[Value1] | [Value2] | [Value3]})

The NORMALIZEARRAY formula function converts a internal OpenFormula array into a valid Java-Array that can be used by other components.

Alternatively, you can get a array of values from a data-source by using the “MULTIVALUEQUERY” formula function.

With a SQL datasource using the sample data you can use a query like this to get the sales numbers for each year:

SELECT
     SUM("ORDERFACT"."TOTALPRICE") AS "Value",
     "ORDERFACT"."YEAR_ID",
     "ORDERFACT"."PRODUCTCODE"
FROM
     "ORDERFACT"
WHERE 
          "ORDERFACT"."PRODUCTCODE" = ${PRODUCTCODE}
GROUP BY
     "ORDERFACT"."PRODUCTCODE",
     "ORDERFACT"."YEAR_ID"

This creates a result-set similar to this:

Value                   YEAR_ID PRODUCTCODE
-------------------------------------------
57363.100000000006 2003 S18_1749
19656.739999999998 2003 S18_2248
33451.850000000006 2003 S18_4409
12768.21 2003 S24_3969
39983.85999999999 2003 S18_2325
52395.560000000005 2003 S18_2795
11024.69 2003 S24_1937
14770.83 2003 S24_2022
38895.38 2003 S18_1342
22257.399999999998 2003 S18_1367
72913.26999999999 2003 S10_1949
40103.54 2003 S10_4962
46714.39 2003 S12_1666

You can then reference this data from a formula with the MULTIVALUEQUERY formula function.

=MULTIVALUEQUERY(“SparklineQuery”, “Value”)

Pie sparks simply require a single value in the range of 0 (zero) to 1 (one). Zero represents an empty pie (0% of the total), 0.5 a pie filled to the half (50% of the total), and 1 represents a fully filled pie (100% of the total).

Formatting the sparklines

Sparklines get formatted via the styles properties.

Bar-Sparks:

  • last-color: The color of the last bar
  • high-color: The color of the bar with the largest value.
  • text-color: The color of all other bars

Line-Sparks:

  • last-color: The color of the marker dot at the end of the spark
  • text-color: The color of the line

Pie-Spark:

  • low-color: The color of the slice if the value is below the “low-slice” threshold.
  • medium-color: The color of the slice if the value is below the “medium-slice” threshold.
  • high-color: The color of the slice if the value is below the “high-slice” threshold.
  • text-color: The color of the pie that is not filled by the slice.

Charting: Categorical Charts and XY-Charts

Reporting without charting is like zombies without the inevitable hunt for fresh brains. You can do it, yes, but it is sure not fun.

These days, charting in Pentaho Reporting is done via the “chart-element” in the Pentaho Report Designer. Drag the chart field into the canvas, double click on it to open the chart-editor, and start configuring your chart.

So far it’s all theory, lets see how charting really works.

Architecture of Charting

Charting in the Pentaho Reporting engine requires three parts. It requires (1) a data-collector to extract the charting-data from the datasources, (2) a chart-expression to produce a chart from the collected data and (3) a report element to display the resulting chart object.

The chart element that you can drag into the report is a front-end to hold the data-collector and the chart-expression and – of course – is responsible for rendering the chart once it has been produced.

Each chart-type is produced by a separate chart-expression. The chart-editor dialog selects the right chart expression for you when you click on one of the chart-type buttons on the top of the window. The visual properties of the chart are configured on the left-hand side of the chart dialog.

Depending on the chart type, you will need a suitable chart-datacollector. Data-collectors are configured in the right-hand table of the dialog. The dialog only offers collectors that can produce valid datasets for the currently selected chart type.

Chart Types

The chart types the Pentaho Report Designer supports can be grouped into three different groups:

1. Categorical Charts

A categorical chart uses a nominal scale to group data. The charting system makes no assumptions on relationships between the data. The X-Axis of such charts display labels for each data point, and all datapoints are printed in the order they arrive at the data-collector.

You can use categorical charts to display nominal data. In statistics, the nominal scale is the lowest measurement level you can use. A nominal scale, as the name implies, is simply some placing of data into categories, without any order or structure.

A example of a nominal scale is the sales regions of a company. Regions have no real relationship or natural order between each other. One region is as good as an other (from a statistical point of view).

Pentaho Reporting supports the following categorical charts:

  • Bar Chart
  • Line Chart
  • Area Chart
  • Combined Bar and Line Chart
  • Waterfall Charts
  • Radar Charts

2. XY-Charts

A XY-Chart uses an interval scale to organize the data it displays. All data items are comparable and can be sorted and all data points have a defined distance between each others.

The most common example for a interval scale in the context of reporting is the date-axis for comparing events on a time scale. (sales over the last years).

Pentaho Reporting supports the following XY-Charts:

  • Bar Chart
  • Line Chart
  • Area Chart
  • Combined Bar and Line Chart
  • Combined Area and Line Chart
  • Scatter Plot Charts
  • Bubble Charts

3. Pie and Ring charts

Pie charts can be used to compare the overall ratio of several numeric values. Your values must be complete (ie add up to 100% of what you want to show) or your chart will be misleading.

Pie charts should only be used if you are comparing the size of a slide with the overall size of the pie. However, this type of charts is not suitable for comparing the sizes of two slices or comparing different pie charts.

The Multi-Pie-Chart chart type should not be used at all. It only exists because some users insisted on this sort of functionality. But this sort of chart is hard to read at best and usually just plainly misleading.

Creating a categorical bar chart

Now, in the hands-on section of this entry, lets create a bar chart that shows the sales of our various product lines over the last few years. This example uses the sample data that comes with the Pentaho Report Designer.

Create a new report in the Report Designer and follow me.

First, You need data

Create a new JDBC datasource (Data->Add Datasource->JDBC). Select the predefined connection “SampleData(Memory)”, create a new query called “chart-data” and add the following query text.

SELECT
     PRODUCTS.PRODUCTLINE,
     ORDERFACT.YEAR_ID,
     sum(ORDERFACT.TOTALPRICE) AS SALES,
     sum(ORDERFACT.QUANTITYORDERED) AS VOLUME
FROM
     PRODUCTS INNER JOIN ORDERFACT ON PRODUCTS.PRODUCTCODE = ORDERFACT.PRODUCTCODE
GROUP BY
     PRODUCTS.PRODUCTLINE,
     ORDERFACT.YEAR_ID
ORDER BY
     PRODUCTS.PRODUCTLINE ASC,
     ORDERFACT.YEAR_ID ASC

Make this query your currently active query: In the data-tab, expand the new JDBC data-source and right-click on the query node below and select “Select Query”.

Then you need a chart object

Drag the chart-object from from left hand palette into the report-header area. This ensures that the chart is printed once on the report. If you put a chart into the Group-header or -footer the chart is printed for each occurrence of that group. Putting a chart into the details band makes no sense. It would print the exact same chart as many times as there are rows in your data source.

And finally we need to configure the chart

Double click on the chart object to start the chart-editor. The “Bar Chart” should already be selected.

First, we define what data the chart will be using. On the right-hand side you find the chart-data collector. Enter the following values into the properties:

  • Category Column: YEAR_ID
  • Value Column: SALES
  • Series-by-Field: PRODUCTLINE

A chart could have several series for each column, which can be useful if you want to compare data-series with each other.

Next define the chart’s appearance. For now, we keep it simple and just change the Chart’s title and the y-axis title.

  • Chart Title: Product Line Sales Trend
  • Y-Axis Title: Sales

Confirm the changes in the chart editor by pressing “OK” on the bottom of the dialog and preview your report.

You can find several chart examples in the samples that ship with the Pentaho Report Designer. Explore them!

Printing Watermarks with Pentaho Report Designer

A watermark is a image or other content that is printed in the background of your report. You can use watermarks to indicate a status of your report (Confidential, Draft or For Internal Use Only). Or you can use watermarks to print a form template into the background of your report.

A well defined watermark can transform a report from a uninspired bunch of numbers into a fully branded document that reinforces your companies image in all receivers of the report.

So how do you define a Watermark in Pentaho Reporting?

Activating the watermark

When you create a new report, the watermark section is hidden. Before you can create new elements on this section, you will have to show it in the Pentaho Report Designer.

  1. Switch to the Structure Tree
  2. Locate the Watermark node under the master report node.
  3. Select the Watermark node and switch the properties table to the “Attribute” view
  4. Set the “hide-on-canvas” attribute to “false”

At the bottom of your design area in the report designer window, you will now see the watermark section below the page-footer section.

Properties of a watermark

The watermark section is printed behind all other content. Watermarks are special content that is only useful for printing and the PDF export. Plain-Text exports and all table-exports do not allow overlapping content. A watermark would prevent the printing of other content on that page. Therefore watermarks are disabled for these export types. Even if your watermark section contains content, the Pentaho Reporting Engine would not process any of it.

A watermark always spans the complete width and height of a page and content printed on the watermark does not affect any other content on the page.

Defining content for watermark sections

Watermark section is a ordinary band. You can drag and drop fields, labels or images into the watermark just as you would do for any other of the bands. The watermark is printed when a new page is started and fields print the same values as fields defined in the page header. Like all page-header or -footer bands, watermarks cannot contain any subreports.

The Pentaho Report Designer ships with the “Production Reports/Invoice” sample report that makes use of a watermark to show a background image to reinforce the branding of the “Steelwheels” toy company and prints a stamp text on the report.

Align Text in Number Fields and other Formatting Basics

Today lets talk about you can format report elements and what options our Pentaho Report Designer offers you.

Anatomy of Elements

Elements can be grouped into two classes. Text elements transform data into a textual representation. Labels, String-Fields, Message-Fields and Number- and Date-Fields are examples of these fields. Graphical elements produce a image to display the data. Charts, Sparklines or Image-Fields are examples for this class of elements.

A report element in Pentaho Reporting consists of style information and attributes. Most styles and attributes can be provided either statically or can be calculated by a function. The static properties are called “design time properties”. The calculated properties are called “runtime properties”.

Attributes control the element’s behaviour and how data is processed for displaying in the element. Examples for this are format-strings, rich-text processing and so on. Style properties control the visual appearance of the data printed.

Basic Formatting

With the Pentaho Report Designer you will find many formatting options that you can find in other text processor or graphical programs.

All elements share some common properties:

  • visible: Defines whether a element is shown on the final print out. Visibility is mostly used as runtime property instead of being specified at design time.
  • min-width, min-height: The width and height of an element define how much space a element takes on the paper. It also controls how many lines of text can be printed and how many characters fit onto each line. The size is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • x, y: The position of the element if placed into a canvas element. The position is either given in percentages (of the width and/or height of the element) or in points (1/72th of inches).
  • dynamic-height: Dynamic height is a flag you can set to let the element expand its height to match the content you attempt to print.
  • various border properties: Allows you to draw a border around the element. You can define the border of the currently selected element very easily by using the menu option “Format->Border..”
  • padding: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element. The padding size is given in points (1/72th of inches).
  • text-color, background-color: Defines the element’s foreground and background colours.

Text elements

For text elements, the most common style you are going to change is probably the font properties.

  • font-name: Defines the font name. The font must be available, or a default font is used instead. This is especially important if you intend to publish the report to a server, which may not have the same fonts as you installed.
  • font-size: Padding allows you to insert some space between the edge of the element (where the borders sit) and the content printed in the element.
  • bold, italics, underline, strike-through: Various flags controlling the appearance of the text
  • embedded: This flag is only useful for PDF exports. It controls whether the font is embedded into the PDF document. Only TTF-fonts can be embedded and not all fonts allow you to embed them.
  • h-align, v-align: Positions the text within the element.

Graphical Elements

Graphical elements have different properties based on what sort of element you are dealing with.

  • scale, aspect-ratio: Defines whether the shape will scale up or down to fit the element’s bounding box. If “keep-aspect-ratio”
  • fill-element: Defines whether the shape will be filled. This has no effect on lines. Elements will be filled with the colour defined in fill-color
  • draw-outline: Defines whether the shape outline will be drawn. The outline will be drawn with the colour defined in text-color
  • stroke
  • : Defines the stroke that is used to draw the outline of the shape.

Attributes

Most of the attributes for elements are advanced properties and not required to actually get a basic report up and running. So I will concentrate on the ones that really matter:

  • value: The static value that should be printed. If this is defined,
  • field: The name of the field from the reporting engine reads the value. If a value is given in the “value” attribute, that static value will be used and the field will be ignored.
  • if-null: A static value that is printed if the field would evaluate to otherwise.
  • data-format: A flag that defines whether the element will update its format string from the meta-data given by the data source.
  • style-format: A flag that defines whether the element will update its style properties from the meta-data given by the data source.
  • show-changes: If this flag is set to true, the element will only print if the value has changed from the previous value. The element will always print if it is the first element in the group or on the page.

Message-, Date- and Number-Fields

  • format: Defines the format string. To make number fields align correctly, ensure that your format specifies a fixed set of decimals in the format string and that you right-align the field in the style options.

Armed with this knowledge, formatting elements in a report should no longer be a daunting task. “Be fruitful and multiply your reports, and fill the sheets and subdue it; and have dominion over your data and over the numbers of the accounting and over every piece of information that moves in your company.”