Monthly Archives: June 2007

New releases of Pentaho Reporting in June

A horrible month has passed by and finally better times appear at the horizon: David Kinkade has joined the development efforts for Pentaho Reporting. Now development can proceed twice as fast.

The OpenOffice Reporting project finally hit the QA, and so the first half of the month was spend in fixing bugs. Although not all of them are resolved yet (the full list), we made some great progress there. All the remaining bugs are related to the OpenDocument-Fileformat processor we wrote for this project; the flow-engine itself, that drives the reporting as data-processing backend, is now ready for the show.

It’s actually amazing: Although most of the demos are not working flawlessly, people start to pick up this engine and work with it. Thanks for your faith, folks! All you early adaptors greatly help to drive the development of this engine even faster.

The second part of the month was dedicated to the Classic-Engine. Although our first pre-release was somewhat bug-ridden, we now exterminated most of the bugs in the renderer and the report-processors. The engine finally produces all demo-reports without crashing and (despite some known bugs we’re going to address during the next weeks) its new renderer behaves like the old one. The funny thing is: The renderer of the Classic-Engine is now more advanced than LibLayout, the renderer of the Flow-Engine. So once we have reached a stable state here, we have to update LibLayout to reflect all these changes we’ve made in the Classic-Engine.

A list of all known open bugs can be found on the SourceForge Project Page for Pentaho Reporting

The next pre-release (or milestone build, to use some commonly accepted terminology) of “Pentaho-Reporting Classic 0.8.9″ is scheduled for Friday, the 13th. (I always wanted to make a release on such a lucky day.) At this point, all the engine-bugs should be fully fixed, the new parser-extensions (for rich-text and the new text-processing capabilities) will be in place and the GUI will work better. (Don’t judge us based on the progress dialogs yet. :))

Download the latest release, test it with your reports and send us your bug-reports. With your help, we can create the most stable version of the Classic-reporting engine ever.

Spotlight on caching: LibLoader

When the Classic-Engine was started, we did not care much about resource-loading. Resource-loading patterns is what happens to other people, not us. When a resource was needed, we simple wrote some code to load the resource in place. We lived our happy life, until, at one day, we wanted to support image-loading.

Image-loading is easy, as long as you just use the AWT-Toolkit and its built in capabilities. But with Pixie (our WMF-renderer) and the many other image libraries out there, we hit the wall the first time. Not the resource-loading code that has been scattered all around the code backfired at us. Either we copy the new image-handling code to every sinlgle occurence  of the old code, or we would create some callable library code.

Luckily we chose the library path and created an image-factory. Our XML parser code was the same story – first a funny collection of random code, and in the next moment a nice little library. Then came the Drawable-factory, so we now have three different resource-loader implementations.

With the dawn of the Flow-Engine, the numbers started to explode. DataSource-definitions, stylesheets, subreports – suddenly every piece was (potentially) loadable.

The walls started to move so that they could hit us from a better angle ..

On a parallel thread of events, deep inside the Pentaho plattform, a new potential source of  problems was hatched by the Pentaho-engineers. Smart as they are (as long as they dont fall into the ‘.., but it works’ pattern of creating horrible hacks, of course), they created their Platform to be independent of the underlying storage system. So in the Platform, you store your reports and their resources either in a ‘SolutionRepository’, which can be anything, from a filesystem based solution to a relational database).

And actually, that was the point, where it made click. What we need is simply a common, reusable and (hopefully) well-designed library, that helps us with locating and loading resources. I’m bad at creating fancy names, so I simply called it ‘LibLoader’.

LibLoader is here to solve all resource loading problems, we encountered in the past time:

  • resource loading is slow, so it adds caching to the IO-layer
  • resource creation is slow, so it caches the actual parsing or resource-interpretation step
  • resource loading from different sources like filesystems to db-repositories to network storages is awfully complicated, so we address this as well by creating a common resource naming schema.
  • And last but not least: It must be lightweight and should not reinvent the wheel.

To add effective caching, we have to solve a couple of problems.

1.Make your cacheable resources identifiable

First, we have to create some sort of naming schema. For us, there are only two systems that seem to be important: Hierarchical storages, where your entries form a tree with parent child relations (like in filesystems), and flat storages, where entries have no relation between each other (like in database storages). Names must have at least some minimal interoperationality – so it must be possible to go from one naming system (like URLs) to another system (like the infamous solution repository). And finally: for hierarchical names, there must be some way to construct names using relative paths (as this is crucial inside the reporting engine).

2.Make it possible to detect changes in the underlying resources

Thats quite easy: Every storage system has such a facility. Filesystems call it ‘last modified timestamp’, CVS called it version, as does the Solution-repository. All we have to do is to map it into a global scope. For that, we simply let the storage system implement a service interface. So we pushed the whole problem down to the low-level layer. Problem solved :)

3.Avoid unnecessary work

Parsing itself is also an expensive operation. XML processing is no fast job – and even if you use streaming parsers like SAX you waste a lot of time doing all the string processing and construction of Attribute-Collections.

So if your resource can be stored safely (so it is protected from changes or it is immutable), LibLoader can also cache the parsing result for optimal performance.

4.Make it easy to use

For historical reasons, Classic-Engine supports multiple report-definition descriptions. Our policy on parsing resources is simple: The user of our code should not care about where it came from, the only thing he should have to care about is the result. In Classic-Engine, and now in LibLoader, we implemented a resource-loading multiplexer. Sounds complicated, but its meaning is simple: For any resource that should be loaded, the library tries all known resource-handlers to interpret the raw-data. So if there is at least one implementation that handles the given data and is able to produce the requested resource-type from it, the resource will be loaded.

Now, if a new resource-type should be added, the implementor only has to care about the acutal loading, caching and dependency management comes at no additional costs.