Citrus PRD and XActions: Redefining an out-worn relationship

From the early days of Pentaho’s Platform, XActions were needed to get meaningful input out of the system. XActions started as sort-of workflow processes. But very soon (thanks to crappy work-flow engines and an ever-present need for always more flexibility) they evolved into a full-blown programming language with loops and conditions.

At the time Pentaho Reporting (at that time still known as JFreeReport) was integrated, our reporting engine did not bother to provide datasources. As we started as a reporting engine for desktop applications, one of our assumptions was, that these applications already have a working data-model and dont need yet another one forced down their throat. So we rather took a free ride on the application’s table models than to waste time reinventing the wheel.

With no built-in datasources nor parameters, XActions had to do all of the data-preparation work.

When the first report-designer came into play, things started to get weird. The old report-designer shipped with an own report-definition format, which incorporated datasources, but was not understood by either the reporting engine nor the BI-Platform. To make reports run, the reports had to be exported (or as we called it: published) into the engine’s native XML format and a XAction describing the datasources. This export was a one-way road, the resulting artefacts could not be safely edited by the report-designer.

Ever since these days, there is a growing disparity between the capabilities of the report-designer’s datasources, the datasources the reporting-engine supports natively, and the various data-supplying components the Pentaho-Platform can utilize. Once we started to integrate Parameters into the reporting-engine, things started to shift apart even more.

Problem zones

  1. Storing information in XActions is a one-way street

    The biggest problem we faced were the XActions itself. XActions are insanely flexible (for a good reason) and highly expressive. But thanks to that power and flexiblity, we cannot safely parse XActions back into datasources the report-designer could use. It is a interesting research area to interpret source code (and it really doesn’t matter whether you write it in C-Syntax or XML) and to map all inputs to an output.

    So our report-definitions always have to contain datasource information so that we can edit that information later. No one wants to alternate between Report-Designer and Design-Studio all the time while editing reports.

  2. XActions duplicate information and XActions are user-editable

    Resulting of the problems we have extracting information out of XActions and our need to keep our own datasource-information in the report-definitions, a few problems arise:

    Both the report-definitions and the XACtions can be edited independently. As long as only the report-definition *or* only the XAction is edited, everything is fine. But that case is not very likely. The report-designer can always generate a new XAction, but this will erase all other changes made to the XAction. In return, the design-studio cannot update the report-definition when the data-components have been edited. Doing so would require full knowledge of what the XAction is going to do when being executed.

    So for most cases, the report-definition contains exactly the same information as XActions, but in a declarative style instead of a procedural programming language.

    This leads us to:

  3. Plain, auto-generated XActions have no added-value over report-definitions

    In the Platform versions prior to Citrus in the majority of cases XActions will be the auto-generated ones for reports that do not use Parameters. For reports, which need parameters, XActions will contain additional components to get the Platform’s parameter UI in place and to validate and pass the parameters to the engine.

    The lack of built-in parameters in our reporting engine greatly kept our support-department busy. A non-technical person seldomly has love for programming in XML in a separate tool than the tool they created their report in. They probably also don’t like to create fake-queries to get the report-designer to work, nor do they like to battle with the data-components in the XAction later to make the queries there parametrizable.

    Like with data-sources, we need to store the parameter-information in the report-definition, to make it editable later. So with the new capabilities of the Citrus-Engine, the built-in features of reporting-engine cover a lot more of what previously required a second editing step.

  4. So far, there are only two major cases left, where a separate XAction proves valuable. The first case opens when complex pre-processing of reports is needed. The reporting-engine’s datasources are tailored to fix the simple “give me query, I give you data” case. Anything that cannot be expressed in a single script or requires multiple processing steps is better handled by a rich language like XActions provide. The second case comes up, whenever the report itself is not the end-result of the processing, as it happens regularly in Bursting scenarios.

How the reporting engine integrates into the platform

Looking back

In the Pre-Citrus releases, all Pentaho-Reporting activities in the platform were channeled through the “JFreeReportComponent”. Aside from the obsolete naming, this component has some severe problems to start with.

When running the report, this component happily discards all datasources that may have been defined in the report-definition. If it’s not defined in the XAction, it is not true. Likewise, it replaces the resource-bundle localization mechanism and performs some odd attempts to parametrize the report.

Subreport-datasources are defined via sub-components defined inside the component definition of the JFreeReportComponent. When they get executed, we do some magic hacks to make them work as if they were part of the XAction, but I wouldn’t bet my life that other than the tested few components would behave so generously by not crashing.

Thanks to a Microsoft style “Stay backward compatible no matter the cost”-policy, we cannot go in and fix the component, as this may break existing XActions. And forcing an administrator of 3000+ reports to edit each one of them to cope with our changes somehow doesn’t sound nice either.

Heading Into the Future

So for the sake of old, pre-Citrus reports, we leave the JFreeReportComponent behind, so that it is free to rot in a corner, and concentrate on a new lightweight component instead.

This component duplicates the functionality of the PRPT-content generator, which is used to execute our PRPT-report-definitions when there is no XAction.� The component publishes the report’s parameter information to the BI-Server in the same way the Secure-Filter-Component triggers a parameter prompt. It accepts parameter values from the outside and validates them against the report-definition’s data, and finally, it executes the report, letting the engine use the report-definition’s datasources to query the data itself.

If data is pre-processed by the BI-Platform, then the engine’s “External DataSource” provides a controllable and well-defined way to inject that data into the engine’s processing. The External-DataSource interprets the value of an parameter as TableModel and returns that model when being queried by the engine during the report-processing. This is a much more reliable way to feed the engine than to scrap all datasources.

By separating the information for the report-processing and the optional pre- and post-processing that happens in the engine, we no longer have to duplicate information in the XAction. The XAction, if needed, can concentrate on its own responsibilities, and editing either the XAction or the report-definition no longer wrecks havon on the other.

The engine itself also adds a capability or two to eliminate the need for custom steps in the XAction. Beginning with Citrus, we provide a Scriptable-DataSource, which allows to construct TableModels at runtime via any of the languages supported by Apache’s BeanScriptingFramework. And to solve the cases where a Report- or Wizard-Specification needs to be created or altered before the report-processing starts, we provide ReportPreProcessors to make this task more efficient than before.

This entry was posted in Development on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.