Monthly Archives: March 2008

Version 0.8.10: All Features on board

Finally, after weeks after weeks of coding, version 0.8.10 is feature-complete. During the next weeks, we will enter a stabilization cycle fixing bugs after bugs until we can safely declare the engine ready for production use again.

Bugs we found in the 0.8.10 codeline that also exist in 0.8.9 will be fixed in both branches. When we finally reach GA-state, we will produce two releases; 0.8.9-5 to bring the fixes to the Pentaho-Platform 1.6/1.7 users, while Platform 1.8 surely switches to 0.8.10 as soon as we can get it integrated.

The feature set of this new version is impressive now.

We ship with  6 data-source-types now:

  • TableModels
  • Static Datasources( calling Java-Methods using reflection)
  • SQL (using plain drivers, JNDI or java-beans as source for the JDBC-connections)
  • Hibernate HQL
  • Kettle-Transformations
  • MQL-queries (using the Pentaho-Meta-Data system)

And of course, we have some more exciting new features:

  • The brand new ODF-based Unified-File-Format
  • Complete End-to-End Meta-Data system integration
  • A structural Meta-Data layer to make all of our elements and expressions introspectable
  • Attribute-Expressions to compute Element-Attributes at runtime
  • Sparkline-Elements
  • CLOB support
  • Complete SVG-Support using Batik
  • a interactive SwingPreview (Hyperlinks for drill-down and drill-across reports)
  • a greatly extended HTML-Export that allows to embed own raw-content for interactive HTML-files
  • Group-wide Keep-Together and group- and report-wide styles

and as mentioned earlier in this channel:

  • A re-born Report-Wizard-Core

If everything works as planned, we shall see the final release in four weeks from now.

Resurrection: The Wizard is back

“It was horrible!” a witness told our reporter at the place of the incident. “We thought we were safe now, after the trial and so on. And now that! How should we feel safe now, knowing that this monster runs free again?”


On Friday, after three years of menace, a anonymous citizen of our community gave the  vital information where to find that foe commonly known as “Report Design Wizard” and even risked his life by assisting in arresting that dangerous person. Our governor, in an attempt to appease the upset masses, ordered a swift trial and (after a swiftly organized popular vote), sentenced the “Report-Design-Wizard” to death and commanded his execution at the very same day.

Officials in cooperation with the governor’s private physician confirmed the death of the Wizard at 3:46 PM. The dead body was buried in a near by cave. “It is a victorious day for all of us.” governor Pilatus said. “Our society must stand firm against all threats from inside and outside of our community. This event, once more shows that vigilant citizens help all of us.” Pilatus continued.

As the whole community was about to celebrate the death of the Wizard, no one was nearby to prevent the upcoming disaster. The escape of the resurrected Wizard might not have been noticed, if not a local group of concerned women were about to properly seal the cave. Apparently, they came to late.

At the site of the incident, local authorities currently secure the remains of what seems to be the major parts of the Wizards body. “It is nothing but a horrible mess in there.” officer Longinus said. “The whole place is filled with torn SWT code and obsolete Castor-XML-models. Whatever happened in there, it definitely did hurt a lot. The Wizard’s core seems to be missing, and according to the traces, something living has escaped from there.”

Local religious authorities currently try to calm down the fearful citizens. “We cannot know whether the resurrected wizard will pose the same threat as the old one. Resurrection on this scale does not happen a lot these days anymore.” a undisclosed source among the senior priests told us. “We simply cant tell how this divine event changed the Wizard’s mind. It is very likely that the new wizard is still very unstable and not fully formed. We all should be very careful when interacting with him for the next weeks.”

Historical sources seem to indicate, that resurrections like this have a great chance to transform even the worst of the communities enemy into a formidable and valuable citizen, who plays a important role in the integration of the disparate parts of the building blocks of our community.

Officer Longinus is convinced: “However the outcome of this story might be, it will be something worth to remember.”

Source: http://source.pentaho.org/pentaho-reporting/engines/classic/wizard-core

Switft Justice – 30 silver coins well invested ..

With the introduction of the new ODF-based file-format in Version 0.8.10 of the Reporting-Engine, the Simple-XML file-format and the Extended-XML file-format will be deprecated now.

The reporting engine will still be able to understand these file-formats, and all existing reports will run as usual. So there is no need to panic.

But the old file-formats will no longer receive any updates. The features they had in Version 0.8.9 will be the features they have in Version 1.0 and 2.0. 3.0 and so on. Namely, these file-formats will not make use of the new report-elements, nor will they be able to use the new attribute-expressions, control the group’s page-breaking capabilities or utilize the newly introduced details-header and -footer bands.

But why? What’s wrong with these file-formats that served the engine over the last years so well?

There are many reasons, both technical and psychological.

Technical reason #1: With 0.8.10, the internal structures of the reporting engine changed to allow us to implement report- and group-wide style inheritance. Groups are no longer organized in a list with no connection among each other. Now all bands are part of a large tree structure, with sub-groups being child-elements of their respective parent-group.

Groups itself became report-elements. This change opens up the path to finally implement cross-tabbing without raping and pillaging through the code to hack that feature in. This change allows us to implement cross-tabbing in version 0.8.11 in a very natural and integrated way.

Technical reason #2: The existing file-formats are limited to a Single-XML-file. This makes it very hard to add new functionality to the file-format while retaining a clean and maintainable structure on them. At some point both the parsers and writers and the tools built on them become so complex that the costs to maintain them exceed the costs of writing the whole thing from scratch.

Adding new features to the Simple-XML file-format always endangers the only reason the file-format exists in the first place. The Simple-XML format is meant to be easily read- and writeable by humans. Once we add enough new features, the ease-of-use aspect becomes a bad joke.

The Extended-XML file-format started flawed. The file-format is a very low-level serialized representation of a report-object. The format is barely readable by humans, and it is so complex, that you need the blessings of all your gods if you want to add or alter features without breaking existing files. I can honestly say, I am afraid to make serious changes to this system, as I could not tell what would happen then.

The Extended-XML format’s structure was bound to the old internal structures of the reporting-engine. With these structures changed beyond recognition, the parser and writer itself would need severe changes to operate in a non-legacy mode.

Marking both XML-parsers as deprecated makes sure that we can safely hack them to accept the old files no matter how the internal report-structures look like today. The ext-writer will not be able to fully serialize “new” reports – the resulting XML file will not contain the new features and given the new group-structures, might not even be accepted by the  Extended-XML parser.

And then there is the psychological reason: The old “hack-your-own-xml-file” game has to stop. But words are whispers in a storm, you have to grab the developer where he’s vulnerable: by his laziness.

In the reporting projects, we now maintain four or five (depending on how you count) xml-file-formats that all do the same; two of them crude derivates of the Simple-XML format.

Using the stick: With the unified-file-format it becomes harder to proceed with the “do-it-yourself” way. The zip-container is more complex than the plain XML-file. But there is an simple API to access it, while the engine can ensure the integrity of the bundle. Sure, you can still hack your way in without using the provided API, but as the costs of that path are now higher, chances are greater that you switch to our API instead.

Showing the carrot: At the same time it becomes easier to just add your own (separate) XML-file into the report-definition-container or to add your data as attribute to the element. All of the hack-file-formats I mentioned before tried to attach some additional information to a report-element to use this data as input later. With the introduction of attributes and the new report-preprocessor-API, these problems can be solved without even touching an own XML-parser.

However, the existing solutions are “ but it works” solutions, so without the hard cut, no one would be tempted to touch them. But now, time works against the laziness: How long will it take for users to demand the new features in the tools that utilize the hacked fileformats? Not long, I’m sure, and then they either fork the parsers and try to adapt them, or come home as the naughty childs they have been, to adapt the unified-file-format.

Wanted: Report Design Wizard

The report-design-wizard has been sentenced to dead for unclean and undirected growth. Over last three years it threatened innocent users with a undocumented and unclean definition language that depended on several XML-hacks and a implementation of features that cannot be accepted in a civilized world.

After spawning an Ad-Hoc-companion, implanting itself into the Report-Designer and terrorizing innocent reporting-engine developers way to long, the government of the reporting-projects finally set out to end the threat this creature imposes to all of us.

Many atrocities have been committed by the wizard:

  • the file-format, based on Castor-XML-Serialization, is highly redundant and only defined in code.
  • the large dependency list of 60MB of JAR files for the sake of querying databases and for firing a preview, while the same could be achieved in 12MBs.
  • providing a report-preview using a embedded HTTP-Server, while the reporting-engine ships with fully functional print-preview
  • the wizard’s execution and fileformat is inherently and unfixable tied to the Simple-XML fileformat, which is deprecated now; while the reporting engine provides a fully featured API to define and write reports to XML files.

The Governor of the development province of Pentaho-Reporting hereby offers a reward of 30 silver coins for all information that leads to the imprisonment of this foe, so that the hangman can finally end the suffering of this foul beast.

Warning: The Report-Design-Wizard is armed and dangerous and usually seen in companion with 12 other dangerous fellows. If you see them, do not try to arrest them yourself. Please call the local authorities for assistance.

Strategies for building Extension-APIs

When writing extension-APIs, the designer usually has two general choices:

  1. The plugin approach: The system can provide a plugin point where external code takes over control and (re)configures the system
  2. The declarative approach: The system calls an user-implementation which computes a token. The caller then incorporates the computed changes into the system.

Both paths have its advantages and disadvantages.

The plugin-path is easy to write, as you literally open all gates and let the foreign code in to refurnish your house. As API designer, your task is simply to define the entry point, and let the user of this API decide what and how to do.

However, it also creates a huge technical dept, as suddenly the structure and behavior of every object and method that is reachable via this API becomes part of the public API. If your internal API is not well-shielded, then you might find yourself in a situation where the user changes parts of the API that you never wanted to open up. The whole situation gets worse as soon as your evil plugin-developers starts to play around with rouge-casts and reflection-APIs.

The declarative approach is more costly to define – as extension-API developer you have to provide a suitable (and hopefully easy to use and understand) model that allows the API-user to fully express the changes he wants to see in the model.

The main advantage of this approach is, that you do not have to open up the private parts of the system. The foreign code runs in its own (more or less protected) sandbox and does not gain control or information about non-related parts of your application.

This limitation of scope makes the whole extension-API easy to understand and easy to use.

The style-expressions introduced in version 0.8.9 are a good example of a declarative plugin-API. The expressions itself compute a value, but it is up to the reporting engine to interpret that value and to make use of it. The engine is free to ignore the computed value, if the value looks like garbage.

The declarative approach is easy for simple property changes, but can evolve into your private version of the 8th circle of hell for large scale tasks. The whole declarative model breaks down as soon as the task cannot be contained in simple data-structures or if the task is somewhat irregular in its nature.

In the very beginning, I used the plugin-approach in the reporting engine. Adding a plugin-interface is easy, and if you make it generic enough, it is a powerful thing. The Functions-API and the old LayoutManager-API (in 0.8.7 and earlier) were examples of a plugin-extension.

The Functions-API is one of the more successful ones. In the Classic-Engine, functions can do pretty much anything they want. They have access to the report-states, the report-definition and all the data-sources. By declaration, functions are allowed to change the report-definition on the fly (which gave us unchallenged flexibility in the report processing) and can do highly complex (and sometimes dangerous) computations.

The LayoutManager-API along with the old output-target-API was a example of an API where the plugin-approach backfired. The idea behind that part was to make layouting and content-generation more extensible. For this part, the inital design process was the equivalent of dropping code on a white canvas to see what sticks. Over time, the implementations became insane in itself – with workarounds for workarounds for bugs.

As the initial API was very generic, there was no controlled way to fix the issues we had with the underlying objects. Any fix to the (formerly) private objects now threatened to break the client code that used that API. There is only one way to fix such an API: To burn it down, salt the earth on which it grew and start from scratch (which we did in version 0.8.9).

The Functions-API had similiar problems, but due to the way that API was received by its users, these problems never evolved to the same level of pain as the layouting-API. What might have saved us here, was the fact that Functions and Expressions were always seen as tied to a certain purpose – compute values or style report-elements.

During the last few years, I learned to value the declarative approach. With a declarative API, you invest time to protect the user from shooting himself in the foot. Although foot-shooting can be a healthy experience (as it can reinforce values like “think-before-you-code”, careful planing or clean and maintainable implementations after you shot yourself hard enough), most of the time the costs still do not justify the outcome. As architect of the outside code, I have to spend hours and hours on protecting the precious parts of the system against evil code. I have to write large documents to spell out in english words what you should do and more often what you should not do with the API. With every change of the internal API I spend hours to think about where and how this change now breaks code in some remote plugin. At the end, some code always breaks.

With the declarative approach, there is only one simple (and hopefully unbreakable) interface. The foreign code is perfectly shielded from the inner details of the engine. The code itself cannot mess with internals of the engine and cannot cause invalid states. Even if the code returns garbage, the engine has its chance to detect this and either fix or reject the request.

There will be only few cases in the life of an software engineer, where a pure declarative approach would be feasible. I also would not go so far to abandon the use of the plugin-approach altogether, but as with the use of nuclear weapons – if you use a pure plugin-API-approach you better have good reasons. Your actions may cause more trouble than you initially bargained for…

Waiter, there’s a Kettle in my soup!

After a long time, the Classic-engine version� 0.8.10 will finally ship with a native Kettle-DataSource. This happens if you arm me with a keyboard and let me loose on the Report-Design Wizard’s source code with an explicit mission to simplify the beast.
The Kettle-DataSource ships as extra project, as Kettle requires JDK 1.5 and its libraries itself are larger than all reporting projects together. But vast powers need sacrifices from time to time ..

The (already working) initial version can be found in our SVN repository at
http://source.pentaho.org/pentaho-reporting/engines/classic/kettle/trunk

MQL-Editor – civilized this time

Are you tired of having to work with crappy UI-toolkits like the Sucker’s Widget Toolkit? Do you prefer a working out-of-the-box experience that runs the same way no matter what Operating System or hardware you use? Tired of having to cope with compatibility and native library issues?

Me too.

That’s why I spend the day rewriting the MQL-Editor in Swing. The original editor was written using the abomination called SWT. But getting SWT code to run cross-platform is a nightmare and usually ends up in a lot of screaming and throwing stracktraces on each other. As I wasn’t able to get the thing running on my Linux-Box without exceptions. After once more tracing it down to the class of “SWT does not work with Swing” problems we faced over the last years, I cannot bear it anymore.

NO MORE SWT!

As the MQL-Editor is a thin layer on top of the Pentaho-MetaData system, rewriting this functionality is faster than trying to cope with the bad design of SWT. And above all: It is guaranteed to work within 24 hours, and it solves all the SWT-vs.-Swing integration problems by killing SWT. That’s a clear message, isn’t it?

So here we are. The Swing version covers all the functionality of the original dialog and adds some extra validation of the input (so that it is impossible to continue with invalid queries).

Grab it at: svn://source.pentaho.org/pentaho-reporting/libraries/mqleditor

XML is not text – and Notepad is no XML editor

When the W3C and the big software vendors introduced XML, their marketing departments came up with a lot of bold claims. XML would be a human friendly format, heck, they even claimed it to be human editable.

On a strictly technical side, all of these claims are true.

On a real life side, these claims are simply wrong.

Only very few XML formats are really human readable, most of them describing configuration files. These XML structures usually have been designed by the developer to be human friendly. But merely using XML does not automatically make everything friendly. The only group that considers XML files friendly seem to be software developers.

Test it: open your Open-Office or XML-enabled MS-Office and look at the documents these beasts create. Or listen to a Web-Service conversation, which is just friendly XML. Do you still think XML is human readable and friendly?

But how about the Human Editable aspect. Well, the term alone is sneaky, as human editable does not mean that much. When I was a kid, savegame files were technically ‘human editable’ for me on a regular base. But I would never have claimed that savegame-files are human-editable at all.

XML files are human hackable, nothing more.

Without deep knowledge about the binary level of these files, it is quite easy to transform a valid XML file into a huge chunk of garbage.

XML is in fact a binary format. There are special cases, where a plain text-file editor can be used to alter the contents of an XML file without breaking it – but these cases are an exception, not the rule.

Do not use a text editor for editing XML files.

Like so many file formats, XML is a layered fileformat.

  1. Binary data levelAll content on your disks is just bits and bytes. XML is no exception here. When working with XML content, applications translate the binary content into a internal textual representation (usually Unicode).

    XML files have the option to declare how a application should translate between the binary and character-data level by specifying the encoding in the XML header.

    >

    Common text editors do not interpret this definition and therefore are likely to break the file on a binary level. A file that was stored in UTF-8 encoding and which was later edited in a 8-Bit encoding like ISO-8859-1 will not be readable afterwards.

    Unless you use an text editor that is aware of the XML header, your chances are high to render your files unusable or to introduce subtle errors like corrupted text contents.

  2. XML-Content levelOnce the XML-parser translated the binary content into its own internal text-representation, it will start to interpret the XML syntax and will produce a high-level view on the XML-elements and attributes described by the character-data stream.

    When not using an editor that is aware of the XML-syntax, you have to validate the syntactical correctness of the file by yourself. Luckily XML parsers are really good at spotting these errors as well, so if the file is syntactically incorrect, your parser will likely tell you so by rejecting the file.

  3. Application levelXML only describes the grammar, means: how content is organized on a physical level. It tells you that there are Elements and Attributes – but the XML standard itself does not tell you anything about the meaning and which elements and attributes are allowed on what position.

    The application-level document structure is commonly described in DTD files (DTD = Document Type Declaration) or XML-Schema files. A good XML editor can understand these files and can validate the document to some point. Although this validation cannot guarantee that your XML content will make sense for the application, at least it can tell you whether your content fits the application’s declared expectations.

    Of course, plain text editors will not provide such validation services.

Well, telling everyone not to use Notepad for editing XML files is one thing. But people use text editors because they obviously have a need to edit XML files.

To be suitable for my working style, an XML editor has to fulfill a couple of prequesites.

  • The XML editor must be able to understand the XML-encoding hint in the header (thus solving the binary translation in a reliable way).
  • Of course, it should be a real editor that allows me to write the XML file like source-code (notepad mode, instead of restricting me to a structural tree-view only).
  • If the editor does not come with syntax-highlighting, then I dont want to use it either.
  • The editor must be able to check the XML file for syntactical correctness (ie, all elements closed etc).
  • Being able to validate the XML file agains at least DTDs and XML-Schemas and providing auto-complete capabilities is a huge plus point.

Finding commercial editors for these requirements is relatively easy. But OpenSource? Hmm. thats difficult. In the open-source world, everyone seems to work with plain text-editors (which do a great job on messing up your encodings if you work with files edited on other systems).

Eclipse seems to have an XML editor as same-plugin, but you have to compile that yourself. (Dont ask me why that IDE does not come with an XML editor by default. Maybe they did not want to spoil the business for the commercial vendors. Or maybe this is just honesty of the Eclipse developers, to show you that Eclipse should not be used in Enterprise settings.)

In the open-source world I have never found a xml-editor that is easy to use – the few I stumbled across were so horrible that I did not even bother to work long enough with it to want to save a file.

On Windows, there are a couple of editors that seem to fulfill at least the basic requirements:

As I cant believe that Windows is more than just a gaming platform, I barely use XML editors on Windows at all.

Personally, I work with IntelliJ IDEA, which has great XML-editing capabilities, which cover everything I need on a daily base. (I don’t believe in stuff like XSLT or XPath/XQuery.) It costs less than some of the commercial XML-Only editors (like XMLSpy, for instance) and has a nice Java-IDE included as well.