Pentaho, the Platform: How about not being a server anymore?

Out there in the wild, when it comes to talking about Pentaho, the first impression people have: Its large, its big, it’s heavyweight, it’s THE SERVER. Its a beast that eats CPUs at night and during the next day administrators make barbecue on the remaining CPUs.

You can really stun people by revealing the well-hidden secret, that Pentaho the Platform is not heavyweight at all. Quite the contrary, if used correctly.

My journey on that started a couple of years ago, when I had to debug the reporting integration in the platform. This was the first time in my life, I actually contemplated about retiring as a farmer somewhere in the north-american desert (not many plants, true, but one or two interesting exist). With every compile, redeploy and then restart of the heavy weight JBoss server, I wondered: Have I sinned that much to deserve that hell? I’m sure Dante Alighieri had a 10th circle in his Divine Comedy, which involved lots and lots of J2EE and JBoss-debugging. But obviously he considered that to cruel to be believable, and therefore cut it out. Even Satan would not be so low …

Obviously, as I’m sitting here and writing this article, I found salvation beyond Peyote or eternal torture. And salvation is to simply not to use JBoss* (or any other J2EE system).

Revelation One: The Pentaho Preconfigured Installation is Not the Pentaho Platform

The official documentation and whitepapers make a fine (and really not obvious) distinction between the “Pentaho Platform” and the “Pentaho Server”. The server is the big and heavy-weight thing, and actually relies on a working J2EE infrastructure to run. So from now on, we will ignore this big pink elephant. The interesting pearl is the Pentaho Platform buried in the Server.

The Platform is a couple of JARs, which are almost entirely made of of infrastructure and glue code. The Platform only has one purpose: To orchestrate the various components into a BI related symphony. If I would have to describe the platform in one single sentence, it would be: “The platform is a runtime environment for a XML-based process-language that additionally provides auditing, logging, configuration and other infrastructure needs to allow to run BI-Jobs.“

Revelation Two: The Platform does not require J2EE at all.

Now we are leaving the heavy-weight area and dive deep into the sacred land of resource-efficiency! Although the platform provides several implementations of its interfaces to integrate seamlessly into J2EE environments, at the same time it also ships with implementations that are not tainted by any J2EE related code. These clean implementations make it possible to integrate the platform into all kinds of Java-Applications.

For me, running the platform outside of a J2EE server allows me to debug the components I write from inside my IDE. I do not have to deal with a heavy-weight server that starts up and shuts down in 5 or more minutes. I do not have to dive through layers over layers of application server code before I come to the parts of the application that interest me. I do not have to deal with HTTP requests. I do not have to deal with configuring a server before I can work. I can start my work immediately.

When I have to deal with XActions and have to find out why the $%&&$ the thing is not working, I also tend to be faster to simply attach a debugger and see whats going on under the covers instead of performing an pen-and-paper analysis of the XAction file itself. Run, listen for the crash, jump to the crash, and search the burning ruins for hints on what happened. Fast and simple and since I started using the platform as embedded tool, I never had to deal with setting up JNDI datasources in JBoss or any other J2EE system and I never had to write a single XML-deployment descriptor again. This is how heaven must be like.

But having the platform as a embedded toolkit opens a whole new world of opportunities. Maybe you have to provide bursting capabilities (that is: generating lots of reports and sending them out to a predefined list of recipients. Much like what spammers do daily but clean and family friendly) then the platform can do this for you with minimal efforts. Maybe you need to integrate reporting in your application and at the same time you have to ensure (and prove later) that the reports have been generated and have been distributed correctly. Or in an extreme case you need to query a web-service to provide parameters to query on a OLAP server to feed a Kettle transformation to run a sequence of reports that are distributed via email, then the embedded platform allows you to run that XAction as easily as a simple report itself.

Revelation Three: Code!

Up to the Platform 1.2.0, there was a sub-project called the Pentaho-SDK, which contained a couple of examples on how to execute XActions in the standalone mode. A SDK on a OpenSource project (where the full sources are always available) was some sort of strange beast, so this project ceased to exist and only the SVN server knows where it’s spirit went. However, the death of the SDK cut of the audience that just wanted to run the platform and who did not want to deal with all the code of the platform.

So here we start again.

(1) Setup the project

Grab the latest sources and copy all JARs from “thirdparty/lib” and all its subdirectories and copy them into your project’s lib directory.

Add all the jars to your projects CLASSPATH.

Build the platform and add the generated JARs to your classpath as well.

Grab a configured copy of the solution directory.

Configure JNDI so that the components know how to access the database(s).

(Remove BIRT from the system-listeners, as it does not seem to initialize in standalone mode.)

Download the preconfigured standalone environment 🙂 (scroll down)

(2) Java: Initialize the platform.

Initializing the platform is easy, all you have to do is to provide a standalone-context and point it to your solution-directory.

  public static boolean initialize()
  {
    try
    {
      // We need to be able to locate the solution files.
      // in this example, we are using the relative path in our project/package.
      final File solutionRoot =
          new File("/home/src/pentaho/pentaho-demo/pentaho-solutions/");
      final File applicationRoot = new File("/home/src/pentaho/pentaho-demo/");
      final StandaloneApplicationContext context =
          new StandaloneApplicationContext(solutionRoot.getAbsolutePath(),
		applicationRoot.getAbsolutePath());

      // Initialize the Pentaho system
      return PentahoSystem.init(context);
    }
    catch (Throwable t)
    {
      // of course, you should have some better
      // error handling than I have ;)
      t.printStackTrace();
      return false;
    }
  }

(3) Execute your XAction. The XAction-Path should be relative to the solution-repositories root-directory. The parameters must be given in the HashMap and must match the declared parameters of the XAction. By adding more code it is possible to provide a UI on top of this process that queries the parameters in the same way as the Pentaho-Server’s HTML-UI does it.

    final String xactionPath =
        "samples/steel-wheels/reports/Income Statement.xaction";
    final HashMap parameters = new HashMap();
    parameters.put ("output-type", "pdf");

    final FileOutputStream out = new FileOutputStream ("/tmp/report.pdf");
    try
    {
      ISolutionEngine engine = SolutionHelper.execute
          ("Just a description used for logging ", "User (only for logging)",
		xactionPath, parameters, out);
      List messages = engine.getExecutionContext().getMessages();
      engine.getExecutionContext().dispose();

      // out contains whatever the XAction produced.
    }
    catch (Exception e)
    {
      e.printStackTrace();
    }
    finally
    {
      out.close();
    }

(4) Clean up. Always shut down the platform before you exit the application. You want to be sure that all data is written into the databases and that all buffers are flushed.

    PentahoSystem.shutdown();
    System.exit(0);

So go ahead, download the package and start walking the lightweight path.

Pentaho-Standalone (ZIP-Package)

Pentaho-Standalone (TAR.GZ-Package)

* Nitpickers corner: JBoss as used in this article actually represents all the evilness found in all J2EE servers. No matter whether you choose JBoss, WebSphere, BEA or whatever J2EE-Servers you prefer, they are heavyweight machinery and not meant to be used for developing applications. Once you are finished developing, they surely form a superior runtime environment for your J2EE code, but everything that makes them good in production makes them horrible for development. Slow startups, heavy footprint and lots of lots of XML descriptors – efficient development should look different than that.

Reporting Tales

Pentaho Reporting Tips and Tricks

Pentaho, the Platform: How about not being a server anymore?

Related posts: