Category Archives: Architecture

Pentaho Reporting extensions points

Pentaho Reporting provides several extension points for developers to add new capabilities to the reporting engine. When you look at the code of both the reporting engine and the report-designer, you can easily see many of the existing modules.

Each extension point comes with a meta-data structures and is initialized during the boot-up process. The engine provides the following extension points:

  • Formula Functions
  • Named Function and Expressions
  • Data-Sources
  • Report Pre-Processors
  • Elements
    • Attributes
    • Styles

Formula functions are part of LibFormula. LibFormula is Pentaho’s adaption of the OpenFormula standard. OpenFormula is a vendor independent specification for spreadsheet calculations. Formula functions provide a very easy way to extend the formula language with new elements without having to worry about the details of the evaluation process. It is perfect if you want to encapuslate an calculation and still be flexible to use it in a general purpose calculation.

Named functions and expressions are the bread-and-butter system to calculate values in a report. Expressions can be chained together by referencing the name of an other expression or database field. Named functions are the only way to calculate values over multiple rows. Adding functions is relatively easy, as named functions only need the implementation as well as the necessary metadata.

Data-Sources are responsible for querying external systems and to provide the report with tabular massdata. Pentaho reporting already ships with data-sources for relational data, OLAP, a PDI data-source that executes ETL-Transformations to compute the data for the report and various scripting options. Adding a data-source is more complex, as an implementor needs to write the datasource, the meta-data and the xml-parsing and writing capabilities. In addition to that, the author needs to provide a UI to configure their new data-source.

With Pentaho Reporting 4.0 we add two additional data-source options, which make it easier to create new data-sources.

The first option uses our ETL tool as backend to parametrize template-transformations. Therefore a data-source developer only has to provide the transformation template, and the system will automatically provide the persistence as well as all dialogs needed to configure the data-source.

The second option uses a small parametrized Java-Class, similar to formula expressions. These calculations, called sequences, are managed by the Seqence-Data-Source, which takes care of all persistence and all UI needs.

Report-Pre-Processors are specialized handlers that are called just before the report is executed the first time. They allow you to alter the structure of the report based on parameter values or query results. These implementations are ‘heavy stuff’ for the advanced user or system integrator.

Last but not least, you can create new element types. Elements hold all data and style information to produce a renderable data-object. The reporting engine expects elements to return either text (with additional hooks to return raw objects for export-types who can handle them), graphics or other elements. An element that produces other elements for printing acts as a macro-processor and can return any valid content object, including bands and subreports.

Element metadata is split into 3 parts. The element itself is a union of the element’s type, attributes and style information. Implementing new basic elements requires you to write a new ElementType implementation (the code that computes the printable content) and to declare all styles and attributes the element uses.

The available style-options are largely defined by the capabilities of the underlying layout engine and thus relatively static in their composition.

An element’s attributes is a more free-form collection of data. Elements can contain any object as attributes. The build-in xml-parser and writer handles all common primitive types (string, numbers, dates and arrays thereof). If you want to use more complex data structures, you may have to write the necessary xml-parser and writer handlers yourself.

A CDF based parameter viewer

At our community conference in Frascati yesterday I gave a talk on how to replace the old GWT report viewer with a slim CDF based report viewer.

Giving Granny a Face-Lifting

This is the slightly edited full-text version of this talk.

Are you tired of our trusted GWT report viewer

When we introduced Pentaho Reporting 3.5, one of the major new features we added was the ability to run Pentaho Reports directly in the BI-Server without the need for writing or generating XActions. This feature instantly removed the number one headache our users had with reports on the server – the need for an additional runtime file, the XAction. The file contained the same information they already specified in the report designer. But to edit the file later, they would need to go into a totally different editor to do some sort of magical programming. Ordinary business users could and would not do that.

We created the report viewer with Google’s Webtoolkit, which promised an easy way of creating rich JavaScript UIs without having to resort to homegrown libraries.

Where has all the love gone?

There are no simple solutions. GWT turned out to create monolithic code that lived on its own island. GWT applications were nearly impossible to extend for normal (non-GWT-using) web-developers. The code was hard to debug (as normal debugger like Firebug cannot help much to make sense out of the autogenerated code). And GWT was slow. Slow to compile and slow to run (compared to other JavaScript alternatives).

Our partners had no particular love for GWT, and bit by bit we grew tired of it as well. Over time we realized that GWT would not be the silver bullet.

Our report viewer implementation also suffered from a few deficiencies. There is no easy way to create alternative layouts for the parameter UI, the date parameter input is simple and limited and long parameter texts can cause problems.

In the mean time, our consultants and partners worked around these limitations by using  CDF to build custom parameter pages for reports.

And then replace the GWT Viewer
… with a normal CDF-Dashboard?

CDF instantly solved several of their problems. In CDF you are free to design your parameter page the way you like it. CDF is by far more flexible than the GWT viewer (as you have simple code with loads of extension points available). CDF is made to create interactive dashboards with a rich user experience.

And CDF comes with a PRPT component, so it already knows how to drive a report.

… but you know you pay the price

But once again: There is no magic silver bullet. Writing an extra CDF file suffers from the same problem that XActions have. Suddenly you have to duplicate the parameter information from the report designer into the dashboard. You have to replicate all parameter dependencies.

CDF requires some technical skill to create a dashboard. Similar to XActions, the report designer cannot read a CDF file for editing (and given that CDF is JavaScript that you can program in any way you want, there is no way we could ever hope to build such an editor). CDFs duplicate the information that is already in the report, and any change to the report parameters must also be applied to the dashboard.

Creating CDF files is not free – your IT department or an external consultant has to do it for the ordinary business user. So from a business point of view, that sort of “premium parameter viewer” would only be feasible for critical reports where you can justify the high development and maintenance costs.

To make CDF work for all reports, we need to solve the “duplication of parameter information” problem.

A simple solution:
Let PRPT’s information drive CDF

Mike D’Amour architected the GWT reporting plugin as a RESTful service. We do intentionally avoided all of the GWT server side libraries to communicate with the server. The report viewer uses standard HTTP-GET or PUT calls to query the server, which responds either with content or XML files.

The report viewer only uses the server’s public URLs to get information about the report’s current parameters. In fact, anyone can call these URLs to get the same information. We do not have any limits on what kind of client you use to interact with the reporting plugin.

Flow, Report Viewer, Flow

The GWT report viewer uses a very simple algorithm to communicate with the server.

  1. First we query the parameter XML by passing all known parameter to the server.
  2. We parse the XML and render the UI
  3. We check whether the server found any problems with the parameter we given. If everything is OK, we ask the server for the report content.
  4. We wait for input from the user.
  5. On any new input or if the user hits submit, we go back to the start and query the parameter-XML again.

You can find more information about this cycle in one of my previous postings.

Action time

Click here to see a simple form-based parameter page. This demonstrates how to communicate with a Pentaho BI-server and shows the basic steps to parametrize an existing report. If you see a login window in the lower frame, then login and restart the demo.

The form itself is simple:

<html>
<head>
  <title>Report Viewer

<body>
<div style="border: 1px solid black; margin-bottom: 20px">
  <h1>Parameter Input
  <hr />
  <form action="http://demo.pentaho.com/pentaho/content/reporting" method="GET" target="viewer">
    <div>
      Report to load:
      <input name="solution" value="steel-wheels"/>
      <input name="path" value="reports"/>
      <input name="name" value="Invoice Statements.prpt"/>
<div> <h2>System parameter

<label for=“renderMode”>Render Mode <select id=“renderMode” name=“renderMode” size=“1″> <option value=“REPORT”>REPORT <option value=“XML”>XML <label for=“output-target”>Output Target <select id=“output-target” name=“output-target” size=“1″> <option value=“table/html;page-mode=stream”>Single page HTML (table/html;page-mode=stream) <option value=“table/html;page-mode=page”>Paginated page HTML (table/html;page-mode=stream)

<div> <h2>User parameter

<label for=“Customer”>Customer <input type=“text” id=“Customer” name=“CustomerNo” value=“242″/> <label for=“ReportStamp”>Report Stamp <input type=“text” id=“ReportStamp” name=“Report Stamp” value=“Review”/>

<div> <input type=“submit” value=“Go!”/>

<div> <h1>Report

<iframe name=“viewer” width=“100%” height=“50%”/>

In the first section we setup a few system level parameter. The form contains the path to the report we want to render (expressed as Pentaho Standard Triple – solution, path, name), the renderMode (that defines whether we query parameter information (XML) or whether we render the report (REPORT)) and finally the output target that defines what output the server should generate.

This form is already a valid method to supply parameter to the reporting plugin and shows that there is no magic involved.

Now, lets do the same again … in CDF

Jordan Ganoff wrote a prototype of a CDF dashboard reads the parameter information from the Pentaho reporting plugin to construct a dashboard.

Due to some security restriction in the JavaScript execution in browsers (same origin rule) I cannot provide a one-click example. Download the zip file and copy the contents into your BI-Server’s solution directory.

You can then switch to the new parameter viewer by replacing the “reportviewer/report.html” part with “web/reportviewer.html”.

So the original URL for your GWT report viewer

http://localhost:8080/pentaho/content/reporting/reportviewer/report.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

becomes this URL

http://localhost:8080/pentaho/content/reporting/resources/web/reportviewer.html?solution=steel-wheels&path=%2Freports&name=Invoice+Statements.prpt&locale=en_US

(Download the CDF based report viewer)

Lets hand over the microphone to Jordan to explain the architecture of this implementation.

Here’s a quick introduction:

The new report viewer is a collection of CDF components. You can follow
the logic starting in reportviewer.html’s load() function. We set up a
div to inject the prompt panel into and then call:

pentaho.common.prompting.createPromptPanel({
          destinationId: "promptPanel", 
          paramDefn: reportViewerParameterLookup(),
          refreshParamDefnCallback: reportViewerParameterLookup,
          extraComponents: [{type: "SubmitReportComponent", htmlObject: 'report-div'}]
});
    • destinationId: the element Id where the prompt panel will be injected into.

 

  • paramDefn: this is the parameter definition (parsed Parameter XML into an object)

 

 

  • refreshParamDefnCallback: function called whenever a parameter has changed its value. For now we will hit the ParameterXmlContentGenerator for a new parameter xml and parse it to a parameter definition every time a parameter value has changed.

 

 

  • extraComponents: Any additional cdf components you’d like initialized. I have a quick prototype component defined in reportviewer.html called “SubmitReportComponent” that will listen for the parameter “submit” to change (which is fired by the submit button on the prompt panel). When this parameter changes the update() method of the SubmitReportComponent is called. We build a valid reporting url and set the iframe’s src to that url to load the report. Pretty straight forward and is exactly how the existing report viewer works today.

 

Core architecture: the parameter panel itself is a CDF component which defines a layout for all widgets provided. The submit button widget is configured to listen to all CDF components created from any non-hidden parameter. Any time the submit component receives a parameter change event its update() function is called and we check if all parameters are valid. If they are all valid we fire a change event for the parameter “submit” which someone else can listen to and do what they need to do.

Once all components are created we pass them to CDF to initialize them. This will register them CDF and eventually calls update() on all components. It’s this update() that will inject the CDF components into the page.

That being said, the thinking here is that in the future anyone can create a prompting panel from a parameter xml and provide their own callback when the submit button is clicked (or any parameter is changed for that matter).

Extension points:

    • The mapping for widget types to parameter types is done in: parameter-prompting-builders.js in the object: pentaho.common.prompting.builders.ParameterWidgetBuilder.paramTypeToWidgetMapping. That’s the internal widget mapping that we use when looking up a widget type in pentaho.common.prompting.builders.ParameterWidgetBuilder.lookupCDFWidgetBuilder.

 

  • Most of the javascript is structured so it can be extended at any point. Hopefully I provided enough functions that can be overridden to make it easy for a hacker to tear it apart!

 

One of the items I’m expecting to change is the delegation of widget creation to layout panels. I’d like to pass the parameter definition to the layout panel and let it create any widgets it needs instead of creating them ahead of time for each non-hidden parameter. This should be a bit more extensible.

Thank you Jordan for creating this amazing marriage of dashboards and reporting!

Even though this is a prototype build, it proves the point that we can tweak CDF to become the new report viewer.

At this point, Gretchen asked me whether this will have any impact on any existing integration with the reporting plugin. Changing server side URLs is always a bad thing and Gretchen voiced the concerns of all our OEMs and partners who built an existing solution for the reporting plugin.

The changes we propose are purely client side changes. There is no need nor any plan to change server side APIs or change URLs or returned formats or any server side behaviour. You are safe.

As far as I know, this new report viewer will be part of the upcoming Pentaho BI-Server 4.5.

A world of new possibilities

This new report viewer adds a bunch of new possibilities to our reporting system. We can easily extend it, we can add new components and new ways of parametrizing reports. Parameters can look sexy and can be visually rich.

With CDF’s flexibility and ability to style the dashboard in any way you want, we can produce more flexible layouts that match existing corporate style guidelines of our customers. With the ability to quickly integrate new components we solve more business cases.

On top of my head, I can imagine a Google-Maps widget to select locations, clickable charts to select customers or product lines. Our Drill-linking can be used in new ways. Why not use a report or analyzer view to present your selection?

With CDF and the power of JavaScript in our hands we can also easily show or hide parameter as needed, or even produce selective parameter input paths based on the user’s selection.

With better parametrization, our reporting system will look more sexy, which means more people will be willing to use it. More customers is always a good thing.

But to make this bird fly, we need all the help that we can get.

How can you help to get it right from the start?

First and foremost: Give us your requirements. For us from engineering it is hard to know what obstacles you hit in the fields or what your clients ask you to solve. So instead of producing a system that is based on our limited “Steel-wheels” world, I would see an open discussion that comes up with a set of true requirements that match what you see from your customers every day.

If you already use CDF to drive parametrized reports, tell us all about it? What is the problem you are solving here? What extra work did you have to do to make it work for your use case? If you wrote some parameter input that might be useful for other: Would you share it with us?

Help us to expand the parameter definition dialog in the report designer so that we can easily add additional attributes to the report parameters. This way we can prototype faster and you can use this to pass additional configuration settings to the CDF parameter viewer.

And if you cannot give anything, then at least test what we write with your data. The earlier you test, the better we will be able to react to the results. And please, please: Test the early builds as well. If the product is already in RC (Release-Candidate) state then it is very hard to make major architectural changes. Your tests of the early builds help us to know whether we move in the right direction and allow us to correct our course when we are not.

So lets start the discussion here and now

What would you need from a parameter viewer? What requirements did you meet that forced you to implement your own dashboard-parameter-viewer?

Strategies for building Extension-APIs

When writing extension-APIs, the designer usually has two general choices:

  1. The plugin approach: The system can provide a plugin point where external code takes over control and (re)configures the system
  2. The declarative approach: The system calls an user-implementation which computes a token. The caller then incorporates the computed changes into the system.

Both paths have its advantages and disadvantages.

The plugin-path is easy to write, as you literally open all gates and let the foreign code in to refurnish your house. As API designer, your task is simply to define the entry point, and let the user of this API decide what and how to do.

However, it also creates a huge technical dept, as suddenly the structure and behavior of every object and method that is reachable via this API becomes part of the public API. If your internal API is not well-shielded, then you might find yourself in a situation where the user changes parts of the API that you never wanted to open up. The whole situation gets worse as soon as your evil plugin-developers starts to play around with rouge-casts and reflection-APIs.

The declarative approach is more costly to define – as extension-API developer you have to provide a suitable (and hopefully easy to use and understand) model that allows the API-user to fully express the changes he wants to see in the model.

The main advantage of this approach is, that you do not have to open up the private parts of the system. The foreign code runs in its own (more or less protected) sandbox and does not gain control or information about non-related parts of your application.

This limitation of scope makes the whole extension-API easy to understand and easy to use.

The style-expressions introduced in version 0.8.9 are a good example of a declarative plugin-API. The expressions itself compute a value, but it is up to the reporting engine to interpret that value and to make use of it. The engine is free to ignore the computed value, if the value looks like garbage.

The declarative approach is easy for simple property changes, but can evolve into your private version of the 8th circle of hell for large scale tasks. The whole declarative model breaks down as soon as the task cannot be contained in simple data-structures or if the task is somewhat irregular in its nature.

In the very beginning, I used the plugin-approach in the reporting engine. Adding a plugin-interface is easy, and if you make it generic enough, it is a powerful thing. The Functions-API and the old LayoutManager-API (in 0.8.7 and earlier) were examples of a plugin-extension.

The Functions-API is one of the more successful ones. In the Classic-Engine, functions can do pretty much anything they want. They have access to the report-states, the report-definition and all the data-sources. By declaration, functions are allowed to change the report-definition on the fly (which gave us unchallenged flexibility in the report processing) and can do highly complex (and sometimes dangerous) computations.

The LayoutManager-API along with the old output-target-API was a example of an API where the plugin-approach backfired. The idea behind that part was to make layouting and content-generation more extensible. For this part, the inital design process was the equivalent of dropping code on a white canvas to see what sticks. Over time, the implementations became insane in itself – with workarounds for workarounds for bugs.

As the initial API was very generic, there was no controlled way to fix the issues we had with the underlying objects. Any fix to the (formerly) private objects now threatened to break the client code that used that API. There is only one way to fix such an API: To burn it down, salt the earth on which it grew and start from scratch (which we did in version 0.8.9).

The Functions-API had similiar problems, but due to the way that API was received by its users, these problems never evolved to the same level of pain as the layouting-API. What might have saved us here, was the fact that Functions and Expressions were always seen as tied to a certain purpose – compute values or style report-elements.

During the last few years, I learned to value the declarative approach. With a declarative API, you invest time to protect the user from shooting himself in the foot. Although foot-shooting can be a healthy experience (as it can reinforce values like “think-before-you-code”, careful planing or clean and maintainable implementations after you shot yourself hard enough), most of the time the costs still do not justify the outcome. As architect of the outside code, I have to spend hours and hours on protecting the precious parts of the system against evil code. I have to write large documents to spell out in english words what you should do and more often what you should not do with the API. With every change of the internal API I spend hours to think about where and how this change now breaks code in some remote plugin. At the end, some code always breaks.

With the declarative approach, there is only one simple (and hopefully unbreakable) interface. The foreign code is perfectly shielded from the inner details of the engine. The code itself cannot mess with internals of the engine and cannot cause invalid states. Even if the code returns garbage, the engine has its chance to detect this and either fix or reject the request.

There will be only few cases in the life of an software engineer, where a pure declarative approach would be feasible. I also would not go so far to abandon the use of the plugin-approach altogether, but as with the use of nuclear weapons – if you use a pure plugin-API-approach you better have good reasons. Your actions may cause more trouble than you initially bargained for…

A zoology of layouting systems

Reporting is easy, but layouting is hell. In the field of reporting, there are three major output systems:

1. Painting

The generated output is placed freely on a canvas. The whole content generation is graphical, the result has to be a good looking result, and only a few constraints are placed on the output. The results of such a painting operation may look good on paper, but are seldom usefull for further editing.

2. Documents

The generated output is written as an ordered flow of data. A text document or an HTML-page is an example for such an output. The generated document does not define directly, where and how the content is rendered. Interpreting the document’s content and styling information is left to an client application (OpenOffice Writer, for instance, or an Web-Browser). In a document flow order does matter. Content is printed in the same order as it appears in the document flow and in most cases, the content is positioned relative to the last element printed. (Paragraph comes after paragraph, line after line, and word is printed after the preceeding word.)

3. Tables

Spreadsheet documents organize their content as a huge table. Tables have been (and in the case of some browsers: still are) used for the layouting of HTML documents. A table is a grid cells organized in rows and columns – and each cell can have exactly one content. Cells cannot overlap each other, but in almost all systems, cells can be merged into larger rectangular areas.

The Pentaho Reporting Classic Engine is one of the many reporting engines, that uses absolutly positioned elements as layouting paradigm. Report elements are placed freely on a canvas (called a Band). There are no limitations on how and where an element can be placed within the canvas. The element’s definition order also defines the painting order – if elements overlap each other, the first element that is painted serves as background for the next elements.

As long as the desired output type also uses such an free-form canvas to describe the output, everything is fine. Printing or the PDF export are examples of such painting output types.

But the freedom of positioning elements freely on the canvas backfires whenever we try to export reporting into documents or tables. Exporting documents is horrible. Converting the painted report into a flow-text does not really work. Although the content may be preserved, all the formatting gets lost, and most document systems are not able to express the required complex layouting rules to render the report as if it was painted. A document processor was not and is not designed for such an abuse.

In the Classic Engine, we therefore do not even attempt to export into flow-text documents. Our way to generate documents is simple: Generate a table that looks as similiar as possible to like the painted content. Tables may be not ideal for layouting, but they make it relatively easy to position content within the table’s grid. In the early days of the web, most HTML pages used tables for layouting, and most browsers can display them reasonably well. And as most document formats also support tables, we can solve two problems with a single implementation. Therefore the table-export gives us the ability to export to Excel-Workbooks, HTML-Pages and RTF-documents.

The table-infested text documents are by no means user friendly or editable, but at least the result looks good.