Style Sheets in Pentaho Reporting 4.0 – I blame Marius

As a small deviation from the usual crosstab and layouter work, I cleaned out the style system of Pentaho Reporting.

It all started not long ago on a stormy night. Creepy rays of moonlight, dark clouds looming over the mountains, wolves howling. On this day, Marius was doing some experiments in getting cascading stylesheet styling into the reporting engine. (I suspect lightning rods and wild laughter was on the menu as well.)

His approach reminded me on the dreadful task I had to go through for the next release. Our style system, a ghoulish monster, has overstayed its welcome. Being ashamed of its nature, we did not expose it to anyone. So this rotten pile of code was sitting around, festering and stinking. I had to kill it. It was an act of mercy for all of us. 

With the style crud gone, the reporting engine now has a simplified style system. With the monster, all style definitions on elements always showed a resolved picture of the global style status. For example, each label’s style fully queried all parent elements to compute the effective font information whenever anyone queried it.

In the new world elements no longer maintain that state. Styles are now resolved as part of the report processing after the style and attribute expressions have been evaluated. It not only simplifies cloning (a lot!), it also tunnels style computation through a single code path.

And then there came polly .. ah, well, the release of Suite 4.8, and the customary fun-week, where the daily drill is suspended for exploring the possibilities of the code.

Mix parts of LibCSS – the zombie that wanted to be a CSS-styled better reporting system – with the single code-path of style resolving, and you have CSS3 selector on top of the existing reporting style-system.

Pentaho Reporting now has a fully fledged style system. Any master-report contains a style-definition, which is basically what you know as style-sheet from HTML. Each style-definition consists of style-rules, where each rule has CSS3-selectors attached.

Style-definitions can either be part of the report definition itself or they can be stored as external files so that they can be shared between reports. The external files have the extension “.prptstyle” and are xml-files containing standard element style definitions.

The Pentaho Report Designer allows you to edit both internal and external style definitions. Use ‘Window’->’Style Definition Editor’ to edit standalone files, or use ‘Format’->’Edit Style Definition…’ to edit the internal style definition.

Both editor allow you to load and save ‘prptstyle’ files. Loading a style-file in the internal editor has the effect of importing that external style-definition into the local report, and saving exports it.

So don’t wait, download your own multi-functional report-designer now!

If you download within the next 24 hours, you not only get a style-definition editor, you also get this:

The ‘master-report’ object now has a new attribute called ‘style-sheet-reference’. Set this to a file-name or URL and Pentaho Reporting will load an external stylesheet when the report runs. Put a ‘prptstyle’ file on a public server and all your reports can share the same style definition.

All elements now also have two new attributes called ‘style-class’ and ‘id’ (not to be mixed with ‘xml-id’ for the HTML export) which allow you to match style-rules to elements. Like HTML’s ‘class’ attribute, class is a whitespace separated list of style-class-names that should apply to the element.

And don’t forget: The green-plus means that all of these attributes can be computed via a style-expression.

Go grab your PRD preview now!

A quick introduction to crosstabs in Pentaho Reporting 4.0

Crosstabs have been on the horizon for several years now. They lived a happy, undisturbed life along with the unicorns and gnomes guarding the pot of gold at the end of the rainbow.

With an endless recession and central banks selling off their gold reserves, the unicorn has been sold to a meat factory and the gnomes now assemble luxury cell-phones in a chinese factory.

So the day had to come that crosstabs have to work for a living. That day is now.

The added crosstab capabilities are the head feature of Pentaho Reporting 4.0 (now to be released in April 2013). To make crosstabs possible, the reporting engine had to move away from the strict banded approach of top-down layouting. The engine now has a true table layout that ensures that even misbehaving crosstab definitions will look and behave good.

Some sort of crosstab abilities were in experimental mode for very much of the last 4 years. The good news: The current implementation fixes all the problems we had with that experimental stuff. The bad news: Your old crosstab definitions will most-likely not work in 4.0 – you will have to recreate them properly with the new version.

To spin off crosstabs quickly, the Pentaho Report Designer ships with a quick-and-dirty crosstab creation dialog. This dialog pops up whenever you add a crosstab group.

Until we update the report designer with a proper user interface, you will have to edit all crosstab properties and elements in the structure tree.

Be aware: The code is pretty much happy-path – stray too far away from it and the big bad wolf will eat you, your children and your data. Wolves are a protected species, so bear your share and help to feed them properly. Before you are fully digested – file bug reports and feature requests in our JIRA system.

Enough prose: Let me just list some of the notable current behaviours and general ramblings.

  • normalization works now, but it is not all-knowing. It will only normalize on existing data items. If you whole dataset has Q1, Q2 and Q4 but fails to mention Q3 at least once, you will not see a Q3 anywhere.
  • The engine wants its data raw and bloody. Denormalize your data! If you use OLAP datasources, use Denormalized version of these datasources.
  • Sort your data to match your group structure.
  • column-headers currently do not repeat on the subsequent pages.
  • Setting your cell-contents (or any other row-layout child) to a percentage width will yield a zero-width in the final layout. (PRD-3970)
  • the report wizard really does not know how to handle anything crosstab related. (PRD-3860)
  • you can add charts into a crosstab cell.
  • if there is more than one row of data for a single crosstab cell, you have the choice to print either the first item, the last item or all items in a large list.

Features that will come, but are not implemented yet:

  • butterfly headers: Move row-headers into the middle of the data. (PRD-4005)
  • Subreports on all crosstab cells (PRD-4006)
  • style- and attribute-expressions for all crosstab elements (PRD-4007)
  • The no-data-band on the crosstab-group has no effect. (PRD-4008)
  • Summary-rows and columns will be positioned either at the start or the end of the respective group. (PRD-4009)
  • Details header will be printed. (API is defined, but not used yet). Printing of them will be optional. (PRD-3949)
  • Header cells can either be spanned cells, or can repeat its content for each child. (PRD-3386)
  • There will be a switch to make row- and column-headers disappear. You can then use the crosstab purely as a layouting utility. (PRD-4010)
  • sorting of data. This will be optional and will blow up if you expect us to do big-data sorting. Get a real database with SQL support 😉

Now go and download a CI build of PRD-trunk, will you!?

Fun with scam callers

There are stupid people, very stupid people and scammer. Since a few weeks, my Skype number gets more and more calls from a tech-support call-centre trying to trick me into installing viruses onto my computer.

The scam is rather old, and well described in the ‘Hello, I am from Microsoft’ article.

Trouble is: I am (a) not stupid and (b) not a Windows user.

I used to get annoyed by those calls, until inspiration struck me. Those fellows probably have not enough fun in their life that they start pestering other people. So I give back what I can to make their life merrier.

My personal goal: Keep them in the line for at least 30 minutes. So far, the best I managed in this game was 7 minutes 30 seconds. This particular time it ended with a unhappy call-centre person calling me non-printable words for messing with him.

The fact that I got to 7 minutes, surprised me. In the 3rd minute, I had to congratulate him for correctly guessing that I have a Mac. Yeah, my horses ran through with me. Silly me! That’s not the sane way to win the game. After that I was able to delay defeat by bluntly lying about the manufacturer of my laptop. The drone actually ignored the content of the congratulation and continued to try to bring me to press “Win-R” for another 3 minutes. 

Only after that he got the hint that I may not play fair and started to throw abuse in my direction. I guess his karma account must be rather empty now, after all these non-printable words.

I do admit, I did not like the call.

Seriously. 7 minutes 30! I lacked proper preparation and a good script, and was not able to keep the drone in the air for as long as I wanted to. Well, not long and they will call again. They always do …

And at some point I will win.

30 minutes. And what is your personal call-centre holding time?

Saiku Reporting 1.0 hits the market

.. and WAQR can finally rest in peace.

The community edition of the Pentaho Bi-Platform just got a whole lot better: On Friday, Marius Giepz released version 1.0 of Saiku-Reporting (SR).

Saiku-Reporting is a adhoc reporting plugin for the BI-Server. SR uses your existing Pentaho Metadata models to produce stylish reports. SR provides you with everything you have would expect from a adhoc tool:

  • Drag&Drop Report-Design with What-you-see-is-what-you-get design of your report
  • Export to: PDF,CSV,XLS,CDA,PRPT
  • Uses PRPT-Templates (bye, bye, WAQR templates!)
  • Grouping
  • Aggregation
  • Totals
  • Calculate additional columns with formulas

Creating a report is rather simple. Select your data from the left-hand side of your screen and drag it into the fields for columns, groups or filters. Double-click on the field-names to open a dialog with some more formatting options. 
 
When creating your reports, you have the choice to view your data either as paginated report or as raw-data. The raw-data view comes in handy when you create calculated columns, as you can see the data in the same way the reporting engine actually sees it.

And last but not least: Saiku-Reporting 1.0 is miles ahead of WAQR, so now we can finally bury that zombie for good.

Thank you Marius for that new member of the Pentaho space. And thank for your not naming it CAR (Community Adhoc Reporting). 😀

10 years of Pentaho Reporting / JFreeReport

Today, 10 years ago I received the e-mail approving the JFreeReport project on Sourceforge. This day also marks my first day as official project leader of the project.

10 years ago, JFreeReport was just starting out. A few weeks before that I added what would later be known as “Simple-XML” format to make it easier to define layouts. In the same year we got PDF export, Windows-Meta-File (WMF) rendering and report functions to calculate values and started gathering a community via the forums at www.object-refinery.com.

JFreeReport was written by David Gilbert. When I needed some printing capabilities for a customer of mine, I selected JFreeReport, as it’s code was clear, and it was easy to embed, had little dependencies and was free of weird runtime requirements (like the need for a JDK to ‘compile’ reports, whatever that means). JFreeReport was 100% in memory with the ability to run in any environment that had a Java runtime environment, regardless how restricted or resource-constrained.

And these days, we uphold that flag. JFreeReport, now Pentaho Reporting, still runs 100% in memory, with no need to swap report data to disk. The core engine is still small, even though the various data-sources can easily add up to 100MB or more, if you include them all.

Thanks to our vibrant community we had a successful decade. With remarkably minimal resources for development we now largely match what other multi-billion companies and their products can do. So thanks to everyone who contributed time, code and blood and tears to make JFreeReport … ahem .. Pentaho Reporting great.

And hopefully, ten years from now, we will be able to throw some flowers onto the graves of the dinosaurs of BI.

Linking to a report with the same export type

When you create reports connected with each other by links, you want to stay in the same output mode as your source report. When viewing a PDF report, you want the linked report to be PDF too, for instance.

So how would you do that in Pentaho Reporting?

(1) You need to know your current export type.

When you export a report, each output type has a unique identifier similar to this one “pagable/pdf”. The identifier consists of

(a) the basic export type:
  * pagable for paginated reports or
  * table for reports exported as layout-tables
and
(b) the content type
  * pdf for PDF export
  * html for HTML
  * .. and so on.

The BI-Server uses the same identifiers in the reporting plugin to select the correct output target for your reports. The parameter for this is called “output-target” and is documented in the Pentaho Wiki.

You probably know about the “ISEXPORTTYPE” function. This formula function allows you to test for a specific output target when the report runs. To get the export type you now could write a ugly long formula with many nested IF functions.

Or you can use this small BSH-Function instead:

Object getValue()
{
  return runtime.getExportDescriptor();
}

to get the export descriptor string directly.

(2) You need to feed this export identifier into your links.

Use the Drill-Linking functionality to add a manual parameter to your link. Name this parameter “output-target” and link this to your BSH-Function. If your function is named “ExportType”, then you would write “=[ExportType]” into the value field.

 
With that connection made, your reports will now be linked with the same output-target as the source report. Set the “Hide Parameter UI” option if you want to link against the target file without having to manually submit parameter.

Pentaho Reporting 3.9-RC is on the slide to its release

Right now the engineers at Orlando are building the Release Candidate of the 4.0 release of the BI-Server. Along with it, we will have a bug-fix release of the Report-Designer and Reporting Engine named 3.9-RC.

The release ships with only a handful of new features and a ship load of bug-fixes.

New feature: Justified Text

This feature has been sitting on my list of things to do for ages. More specifically, since the old “Pre-Pentaho” days, when the internet was still young. We now have a new text-alignment option for your content and a button for it on the toolbar.

New feature: Heavily Scripted Data-Sources

This is my personal favourite of this release. All major data-sources (SQL, MetaData, Mondrian and OLAP4J) can now customize both the configuration of the data-source, the query that gets executed and the result-set that gets produced. At the moment we ship with support for Java-Script and Groovy scripting to make these customizations.

Improvement: CDA-datasource local calls

When running on the same server, CDA data-sources now call the CDA plugin via Java-calls instead of routing all calls through a network layer. THank you, web-details, for this addition.

Improvement: JQuery based report-viewer

Well, not sure whether this is a new feature or improvement, but we now have a pure Java-Script based report-viewer based on JQuery and standard JavaScript. It is so clean that even I can understand that code. Over the last few years, we heard more and more desperate calls for a better report viewer that can be customized to customer needs. If you are a web-developer or know one, you now are one step closer to a customized report parametrization experience. Credits go to Jordan Ganoff for coming up with this great addition.

Improvement: Data-sources now obfuscate stored passwords.

When writing PRPT files, we now obfuscate all passwords so that they are not as asy to read. Note that this does not make any change to any skilled attacker – as anyone with a debugger or access to the source-code can see how these passwords are en- and de-coded. If you need true security for your database credentials, you still have to use JNDI connections and ensure that only trustworthy administrators can access the server configuration.

Bug-Fixes: A load of layouter related issues. Subreports processing with page-footers and repeated group-footers was buggy. The HTML content produced by the HTML output writer is now much cleaner and less verbose. The table-datasource editor now can remove multiple rows at once and has other usability problems fixed as well. For a complete list, have a look at the release notes on our JIRA system.

Tidying up our HTML exporter

When it comes to flexibility, Pentaho Reporting always had a knack for erring on the obsessive side. With calculation formulas and scripting everywhere, a OEM or implementation partner has plenty of options to get the report just right. 

Our HTML export makes no exception here. Last year I talked a bit about the ability to inject custom HTML or JavaScript into the output to produce a richer web-experience, like Fancy Tooltips.

Injecting Scripts happens via two special attributes (“html::append-body” and “html::apend-body-footer”) which insert the raw content either before or after the generated content. So far nothing new.

Writing proper JavaScript is a art on its own, and is a lot easier when the resulting HTML document has a clean and digestible structure. The output of the Pentaho Reporting Engine was usually filled with numerous spans and divs, making it hard to wade through the elements generated by the report.

With the latest bug-fix for Pentaho Report Designer 3.9 the report generator produces clean and minimalistic HTML. Over the last two days, I implemented a filter to check for inherited CSS styles.

CSS (Cascading Style Sheets) defines two classes of style attributes: Local attributes and inherited attributes. The normal attributes are only defined for the current element. Attributes like “border” or “margin” only make sense for current element and would cause visual disturbances if passed on to the child elements. Inherited styles, like all font properties (color, family or size) get inherited. If a child element does not define its own settings for these properties, the child uses the parent element’s style as its own.

By encoding this logic into the HTML report output we can now omit all inherited styles if the same style has been defined on a parent element already. At the same time we can omit all local styles if the style is empty (no border, 0-pt padding etc.). After applying all these optimizations, most elements actually have no own style definitions anymore. This alone makes the report more readable, but we can do better.

As long as the style is empty and the element does not define local HTML attributes (“id”, “class” or any of the “html::on” attributes, we now can safely omit writing the element’s tag. For most cases, this now reduces the complexity of the HTML-DOM greatly and navigating the DOM becomes a lot easier.

Killing XActions for Reporting – slowly and with pleasure

The new Pentaho-Metadata data-source scripting-extension I produced recently seems to be well-received. We now have a great opportunity to fully cut back on XActions for plain reporting uses.

Many users still have to stick with their old XAction driven reports. I assume that they do not do that because they enjoy the pain, or love programming in XML. No, the majority needs to drive parameter queries where the query itself is computed. Reasons for that are many:

  • If everything is placed in one query, the query gets insanely complex and unmaintainable.
  • The query access legacy systems with no sane data models or weird partitioned tables.
  • The data-source needs to be configured based on some other parameter before it is used.

 Now users who have been locked in by these cases can now free themselves from the slavery of weird XML-programming. I happily announce that

Pentaho Reporting now ships with sane scripting support for JDBC, Pentaho Metadata, all Mondrian and all OLAP4J data-sources.

The data-source scripting system of the old days is now a dark memory of the past. The new scripting is integrated into the data-source itself, so scripts become more reusable. Editing large scripts on the report’s “query::name” attribute was never really fun anyway.

The scripting extension allows you to configure the data-source before it is used for the first time. You can now configure all aspects of the data-source via your script, including but not limited to any supported connection-parameter like the JNDI name or additional JDBC-connection properties:

var jndiConnectionProvider = dataFactory.getConnectionProvider();
jndiConnectionProvider.setConnectionPath("java://myjndiconnection");

You cannot get away with not configuring the data-sources at all, as the report designer and all other tools love to have something static to work with. But you can always reconfigure them to your liking before the report is run. The scripts are always called before the a query is executed on the data-source, even inside the Pentaho Report Designer or the data-source editors.

I don’t believe that we will need similar scripting support for the Kettle, Java-Method-Invocation, Table or the Scriptable data-sources. These data-sources either do their own scripting anyway or would not profit from any scripting abilities.

Doing the performance dance (again)

I just changed another bit of the table-export while integrating a patch for PRD-3631. Although the patch itself did take a few illegal shortcuts, it showed me a easier way of calculating the cell-backgrounds for the HTML, Excel and RTF exports of Pentaho Reporting.

After a bit more digging, I also fixed some redundant calls in the HTML and Excel exports for merged cells and row-styles. Both resulted in repeated calls to the cell-background calculator and were responsible for slowing down the reporting engine more than necessary.

The performance of my test reports improved a bit with those changes. But if any, then this case has shown me that clean report design is the major driver of a fast export.

The performance for the reports went up by 15 to 30 percent, with the larger changes on the reports with larger row-counts. However, the reports I test are surely non-representive, as there all elements are perfectly aligned and the report is designed to avoid merged cells.

The patch specifically claims to address performance problems in the cell-style calculation. Agreed, there were problems, and the patch addressed them. But there was no way I could see a 100% improvement on normal reports. Well, not reports that are well-designed and use the powerful little helpers that the Pentaho Report Designer offers to make reports well-aligned.

When I receive production reports for debugging, the picture is usually more bleak. Fields are place rather randomly, and usually misaligned by a few points. They start and end on rather random positions, and usually elements are not aligned to element boundaries across different sections.

Let’s take this visually fairly report as an example:

The many fine grey lines you can see mark element boundaries. The more lines you see, the more cells your report will have. Each cell not only means larger HTML files, it also means more processing time spent on computing cell-styles and cell-contents. Thick grey lines spanning across the whole section usually indicate elements that are off by less than one pixel.

These lines are produced by the Report Designer’s “View->Element Alignment Hints” feature. When this menu item is selected, you will get a better idea on how your report will look when exported into a table-export. If you cannot see the details clearly, zoom in. The Report designer happily magnifies the working area for you.

When exported to HTML, this report here created a whopping 35 columns
with another 35 rows. That is potentially 1225 cells. The resulting
HTML file has a size of 21,438 bytes. For a report with just a few items of text, this is a lot.

In general you’ll want to avoid having to many of these boundaries. In the basic design courses, teachers tell fairly early on that layout where element edges are aligned look cleaner and more pleasing for the eye. When you look at adverts or magazines, you can see this on how articles and images seem to sit along visual boundaries or dividing lines. For a well-designed report this is no different.

To help you design your reports in a well-designed fashion, the report designer comes with the “View->Snap to Elements” feature.

To clean up a report, I usually start by aligning elements that are sitting close together. Visually, it makes no difference whether a element starts at x=24 or x=24.548. For the reporting engine, this makes a difference, as a dumb little engine cannot decide whether the user was just lazy or had a very good reason to have a cell at exactly these positions (or whether some visual design would break by attempting to blindly fix it). 

With the “Snap To Elements” enabled, just select one element and drag the mis-aligned edge until it snaps off its current position. Then move it back into position. This time it will snap to one of the other elements. If your edges are very close, I drag the current edge towards the top (for the y-axis) or the left (x-axis) until it leaves the crowded area. When I return with it, it will snap to the first (top-most or left-most) edge in the group of elements by default. Repeat that with all elements that sit near the edge and very soon you should only see one thin line indicating a perfect alignment.

Additionally, you can also change the element’s position and size to a integer number in the “style” table on the right-hand side of the Report Designer. When you do that for all elements, your alignment task will become a lot easier. Now elements are either aligned or at least one point apart (and the mis-alignment is easier to spot).

The quickly cleaned up version of my sample report now has only 24 columns and 16 rows, but visually, you cannot tell the difference between the two of them. Theoretically, the resulting table can have 384 cells, compared to the mis-aligned report a reduction to a quarter of the original 1225 cells. And finally, the generated HTML file shrunk to a size of mere 8,853 bytes, one third of the original size. In my experience and with those numbers in mind the computing time for this optimized report should be roughly 10% to 15% better than the optimized version. In addition to that slight boost, your report will download faster and rendering the report in the browser will be a lot quicker as well.  

So remember, performance optimization starts in your report designer: When you optimize your report designs it instantly pays off with quicker rendering and smaller downloads.

Further optimization

That report uses line-elements to draw borders around the statistic sections. By using sub-bands with border definitions, that report could be simplified further, but I’ll leave that as an exercise for the class.