Better Builds (1/3): Speed up the Pentaho Reporting Build

When developing bug-fixes or new features for software I write (in the context of this blog: Pentaho Reporting), I tend to follow a test-supported development model. Its not the pure “write tests first” approach of the Test-Driven crowd, but a workable approximation of the main idea behind it: Any code written should have some automatic validation in place to make sure it works as intended today and in the future.

The number one prerequisite to make that work is a fast feedback loop between me writing the code and an automated system to tell me I messed up again.

In this series of three posts I am going to demonstrate how to set up a local build environment that greatly speeds up the build process and that automatically validates all builds for you.

Contents of this series

In this first post I will introduce some necessary changes to the build scripts that allow us to manage the build configuration from a central location and then to use this to speed up the process.

In the second post I will show you how to set up a CI server and how to feed it with your changes, so that it builds your work for you the moment you push the changes to a shared (possibly private) git repository.

And finally, in the third post, I will show you how to create a build chain to assemble a BI-Server and how you can create and run integration tests against this assembly. This then provides you the means to run both Java JUnit-tests and JavaScript Jasmine tests against a fully deployed BI-server to make sure that everything still works.

Why does the build process needs improving?

When you start developing today with the infrastructure provided by Pentaho, this development approach is nearly impossible. Pentaho’s CI servers only build the official branches of the various repositories. So any code will only be automatically tested after I have pushed the code into the main line that may become the next release.

I could – of course – manually test the code. Well, my problem is that normally I am quite god-damn-sure that my code is all correct. Call it trigger happy. So I need a system that validates my work regardless of my opinion.

Let’s agree that this is not safe. I need to be able to automatically validate changes before I start the pull request, or latest after the pull request is created but before it is merged. Ideally I should be able to validate the code automatically in a tight loop with early feedback within minutes to no more than one hour at worst, instead of the 8+ hours which are the norm today.

And finally, today the Maven artefacts produced by the official build process are near useless, thanks to marking every dependency optional. So lets fix that as well.

The fixes in this post are nearly non-intrusive. Instead of forcing these changes into the build process, I chose to make the overrides optional to give you the choice to activate them as needed. If you’re building a one-off build, you wont see them and you build with a process as similar as possible to the Pentaho build servers. But if you need it, you now can tweak the build without having to maintain a separate fork of the sources.

The only mandatory changes that are required to the build files load additional properties from the home directory and inject a set of new Ant-targets into the build without altering the behaviour of Pentaho’s subfloor build system. Without any user-defined overrides, this system will behave as good or bad as before – but now you have a choice to change that for your local build whenever and however you want.

How to fix the build process in three easy steps

The changes I introduce to the build can be organized into three stages:

  1. Prepare the build to allow us to override settings and to inject tasks
  2. Make the local publish process work faster by setting up a local Artifactory repository as proxy.
  3. Actually fix the build by overriding some subfloor targets with better alternatives.

Stage 1: Prepare the build (already done for you in the reporting projects)

Since the day we moved Pentaho Reporting to GitHub for the version 5.0 release, the build files contain the ability to define build properties in a central configuration file in your home directory.

<property file="${user.home}/.pentaho-reporting-build-settings.properties"
description="Per user override settings-file for all pentaho-reporting projects." />

This will already allow you to replace things like the location of the “ivyconfig.xml” file, tweak config options or even to replace the common build file (also known as subfloor) with your own version.

However, while overriding configuration options is safe, replacing subfloor is not. If there are updates to the build process, you would have to manually keep in sync with them, effectively maintaining your own fork. Subfloor is complex enough to make this work potentially dangerous.

If replacing subfloor is a sword, then the next change is the equivalent of a scalpel:

<property file="${user.home}/.pentaho-reporting-build-settings.properties"
description="Per user override settings-file for all pentaho-reporting projects." />

<!-- Define the default location of the common build file -->
<property name="reporting.build.file" value="../../build-res/reporting-shared.xml"
description="This is the location of the standardized build-res/reporting-shared.xml file"/>

<!-- Import the shared build override file which contains all the default tasks -->
<import file="${reporting.build.file}"/>

<!-- Define the default location of the common build file -->
<property name="common.build.file" value="./build-res/subfloor.xml"
description="This is the location of the standardized build-res/subfloor.xml file"/>

<!-- Import the build-res/subfloor.xml file which contains all the default tasks -->
<import file="${common.build.file}"/>
..

With this change, we inject a ant-include file before we finally load subfloor. When Ant loads build files, it allows multiple build files to define the same target. However, once a target is defined, subsequent build files cannot override this declaration.

When loading a build file with imports, it first fully defines all targets before it processes the import. The first file loaded is always your build.xml file, followed by the imports in the order they appear in the file. So by putting the import for the “reporting-shared.xml” file before the subfloor import, we now can replace subfloor targets without having to alter subfloor itself.

I keep the reporting-shared.xml file in a central directory (build-res) at the root of the repository. This way, if there are changes necessary, only one file has to be changed.

The file is simple and only provides a common set of targets to make it easier to build the whole of Pentaho Reporting in an automated way (or on the command line).

It defines the following targets:

  • continuous-local : builds the project, runs the normal unit tests with code coverage and publishes the artefacts to the local ivy cache. This takes a hell of a long time as cobertura is slow on large projects.
  • continuous-local-junit: Builds the project as above, but only runs standard JUnit tests. This is the normal target for validating changes quickly.
  • continuous-local-testless: Builds the project without running tests. Use this if you just need the binaries and if you have a trustworthy CI server to run the actual tests.
  • continuous-junit: Builds the project with unit tests without code coverage. The artefacts are published to a Maven repository. You will need your own Maven repository server for that and you will have to configure ivy to resolve against this server as well.
  • longrun-test: Runs integration level tests with code-coverage. These tests take a extremely long time to run and require that you have published  the artefacts either locally or to a Maven server, as the build will depend on some of these artefacts.
  • longrun-junit: Runs the integration level tests with normal JUnit. This is considerably faster than the cobertura runs.

In addition to these targets, Pentaho’s subfloor also defines some useful targets for CI and command line environments:

  • continuous: Builds the project with cobertura test-coverage and publishes artefacts to a Maven repository server. As with all cobertura targets, prepare to wait for a hell of a long time.
  • continuous-testless: Builds the project without running any tests, and publishes the artefacts to a Maven repository. I do not recommend this. If you publish shared artefacts, take the time to at least minimally validate them by running the unit-tests. Otherwise you may just introduce a bunch of hard to trace bugs in an otherwise trusted source.

With those changes in place, we now have a well-defined build environment on which to build.

If you want to adapt this process to other Pentaho projects, then this should be painless as long as they do not stray too far away from the standard build process. I successfully patched the BI-Server and the server plugins with this process – but have not tried the same with either Mondrian or Kettle, as neither of those projects need to be built locally to get a working pentaho-bi-server assembly on demand.

Just checking out the Pentaho Reporting project and running

ant continuous-local

will now build all the modules and finally create the finished Pentaho Report Designer in the “designer/report-designer-assembly-dist” directory.

On my machine (i7-3537U) this takes 68 minutes from a freshly cleaned cache and 17 minutes from a populated cache that is up to date with Pentaho’s latest builds. Normally the build times are somewhere in-between these two extremes, as when snapshot builds get rebuild they need to be downloaded again.

Step 2:  Speed up the local builds by installing Artifactory as a local proxy

When you profile the build, either by just looking at it or by using an ant-profiler plugin, you will see that most of the time during the build is spent resolving libraries and copying artefacts around.

Let’s fix that.

When Ivy resolves artefacts, it follows a simple process: For release versions, it tries to find any server that contains the release artefacts. And for snapshot versions it contacts all repository servers to find the latest version of all snapshot releases.

So if you have one server in your resolver list, Ivy contacts one server. If you have 10 servers, ivy contacts all 10 servers, and then chooses one to download artefacts from. It does this for each artefact separately. So if you need to download 100 artefacts, Ivy will possibly make 10 x 100 connections to remote servers and then possibly another 100 connections to download the artefacts. Each connection attempt takes time to answer. And even though the bandwidth available to us has improved, the response time has not. It is not uncommon to wait 500ms for a server to return with a response. At 1000 requests, that makes 500seconds (or 8 minutes) of plain waiting time.

So lets fix the network problem first, by installing a local proxy. This way, requests for cached artefacts are guaranteed to be answered in less than 10 milliseconds, and with virtually no upper limit on the bandwidth used to download them.

1. Install Artifactory as your local proxy server

First, download and install Artifactory. Download the ZIP version from the JFrog download pages. Unzip this file into a directory, for instance into your HOME directory. Then all you need is to start it via artifactory.bat (windows) or artifactory.sh (Unix, Mac).

This assumes that you have JAVA_HOME defined as environment variable and have it pointing to a JDK 1.7 installation, so that the startup scripts can find a valid Java installation.

Then access your server via “http://localhost:8081” (note: This is different from the usual 8080 port used for servlet container). The predefined administrator account uses the username “admin” and the password “password”.

Now that Artifactory is running, we need to configure it a bit.

2. Configure Artifactory to know about the Pentaho Repository

Log in as admin and click on the “Admin” tab.

build-series-1-step-1

On the side, click on “Repositories” to bring up the repository configuration.

build-series-1-step-2

On the repositories configuration page, locate the section labelled “Remote Repositories”. Click the “New” button to add a new remote repository.

build-series-1-step-3

Give this repository the name “pentaho-public”. Now set the repository URL to “http://repository.pentaho.org/artifactory/pentaho”. Make sure the “Handle Releases” and “Handle Snapshot” options are selected.

build-series-1-step-4

Pentaho maintains a second repository of artifacts that we need during the build process, but which have never been published on a public Maven repository, and for artefacts where the public Maven copy is broken.

build-series-1-step-5

As before, configure a new remote repository with the identifier “pentaho-third-party” and the URL “http://repository.pentaho.org/artifactory/third-party”. Make sure the “Handle Releases” and “Handle Snapshot” options are selected here as well.

3. Configure the remote repositories.

Artifactory acts as a caching proxy. Therefore for each request we send to the server, the server will contact all configured repositories to find the best artefact for us. As with plain ivy – the more repositories we have to ask, the more time we waste.

Locate the “Virtual Repositories” section and select the repository named “remote-repos”.

build-series-1-step-6

Remove all repositories from the list of active repositories until you have only “repo1” left.

And finally, locate the “pentaho-public” repository you just created and add it to the list of active repositories. Now do the same with the “pentaho-third-party” repository as well.

And finally, reorder the configured remote repositories so that the “repo1” repository is the last element in the list, as shown on the screen-shot.

build-series-1-step-7

Congratulation: You now have an Artifactory server that can serve as your local proxy. Lets tell the build process about it!

4. Create an ivysettings.xml file.

Ivy knows about remote servers via its ivysettings.xml file. The Pentaho projects come with a settings file that points to the public pentaho servers. Now that we have our own proxy, we want to use that one.

Take this ivysettings.xml file and place it into your home directory.

The file contains sensible defaults to allow you to access a configurable Maven repository. It is based on the Pentaho ivysettings.xml file, but also updates the caching strategy to be safe when run in a CI environment. And last but not least, it separates this build from the default configuration used by Pentaho’s default settings, so that we minimize any conflicts.

Now we need to define the necessary overrides to make the build use this file.

Create a new properties file named “.pentaho-reporting-build-settings.properties” in your home directory. This file will hold all global settings to configure the build process.

# Build override for Pentaho Reporting

# Fail the build if a test fails
# Fail the build if a error occurs
junit.haltonfailure=true
junit.haltonerror=true
ivy.settingsurl=file:///${user.home}/ivysettings.xml
ivy.repository.resolve=http://localhost:8081/artifactory/libs-snapshot

# Used later during publish to a maven server
ivy.repository.id=libs-snapshot
ivy.repository.publish=http://localhost:8081/artifactory/libs-snapshot

This file first changes the default for unit tests to a safer option. Now, if there is any kind of failure during the tests, the build process will fail. This may seem radical, but if failing tests are a bad thing then failing silently and hoping that a human will notice is even worse.

Now build the whole reporting project again, via

ant continuous-local

The first time you run this, your Artifactory server will reach out to the public Maven and Pentaho Server to download all artefacts. Any subsequent access will be cached and will be much faster.

And last but not least, but equally important: If the Pentaho server goes down, your Artifactory server will continue to serve artefacts and will periodically check whether Pentaho’s server comes up again. As I am writing this, the Pentaho server seems to deliver read-timeouts instead of artefacts, but the local cache holds up against it.

A clean cache build – same as above – now takes 45 minutes and the fully cached build slightly exceeds the pure ivy build with 15 minutes. However, thanks to the stronger caching promises made by Artifactory, you will now always hover closer to the 15 minutes than the 45 or 68 minutes of a cold cache build.

This Artifactory server will see more use later during the CI builds in part 2 of this series. The integration tests require this server to be running as Ivy and Maven do not communicate well without a server to translate their requests.

 

Advanced bonus content

To quickly configure your Artifactory server, use this prepared config descriptor. Go to the Admin tab in Artifactory and locate the “Config Descriptor” option on the side (under the “Advanced” Category). Then copy the contents of this Gist into the text box and hit save. This configures your Artifactory server immediately with all the correct settings. But a big warning: This will overwrite any other configuration you may have made before. Use with care.

This entry was posted in Development, Report Designer & Engine on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.