Use code to replace a datasource in a report

I’m not exactly a social media user, so I am rather surprised that every now and then I get a question via these channels.

There are two ways to change a PRPT file. One, the ugly way, is to crack open the ZIP structure and to mess with the XML files contained in there. I won’t support this atrocity with documentation. The second way, the good way, is to use the reporting API to do your changes. It is clean, and it will ensure that your report will be valid. And best of all – its unit testable.

This will be a code-heavy post, so lets talk about the setup first, to keep the main body simple.

I assume you have a project set up that has all the reporting libraries ready and that contains all jars for every data-source you are using in your report. Normally that means you are bootstrapping from the SDK’s “sample-use-full” module.

Now lets create a base harness for our task. A simple Java-class with a “public static void main” method will do.

public static void main(String[] args) throws ResourceException, ContentIOException, BundleWriterException, IOException {
  processReport(args[0], args[1]);

The processReport method will take two parameter. First, the source file, which should be a valid file name pointing to a PRPT file, and second a target file name, where the processed report will be written to. I personally do not like overwriting the source, it makes it rather hard from recovering from errors.

Lets add the “processReport” method next. I still consider this boiler plate code, as all it does is to parse the report, hand of the MasterReport object to the actual method that does all the work, and then write the modified MasterReport into a PRPT file.

private static void processReport(String sourceText, String targetText)
  throws ResourceException, IOException, BundleWriterException, ContentIOException {

  File sourceFile = new File(sourceText);
  MasterReport report = (MasterReport) new ResourceManager().createDirectly(sourceFile, MasterReport.class).getResource();

  MasterReport processedReport = manipulateReport(report);

  BundleWriter.writeReportToZipFile(processedReport, new File(targetText));

Again, the code should be rather self-explaining. The first two lines parse the report, next we hand it off to some other method to manipulate the report, and whatever we get back will be written out into a new PRPT file. I omitted the proper exception handling to make the code more readable – if it crashes, it will burn, wild but beautiful.

Now finally, the meat. Manipulating reports. First, something simple: Lets not do anything at all, lets just return the report. This effectively copies the report from the source to the target file.

private static MasterReport manipulateReport(MasterReport report) {
  return report;

Now, lets modify some data-sources.
First, we need more boiler-plate code.

Data-Factories are stored on a report. A master-report can have sub-reports, which itself can have data-factories defined. Of course, sub-reports can have other sub-reports, which have their own data-factories and so on.

A report can contain multiple data-factories by using a “CompoundDataFactory”. Reports created with PRD always use a compound-factory – it makes the code a lot easier and adds almost no overhead.

Lets expand our “manipulateReport” method a bit.

  private static MasterReport manipulateReport(MasterReport report) {
    new DataSourceStructureVisitor().inspect(report);
    return report;

  private static class DataSourceStructureVisitor extends AbstractStructureVisitor {
    protected void inspect(AbstractReportDefinition reportDefinition) {

      processSingleDataSource(reportDefinition, "query");

    private void processSingleDataSource(AbstractReportDefinition reportDefinition, String query) {
      CompoundDataFactory dataFactory = CompoundDataFactory.normalize(reportDefinition.getDataFactory());
      DataFactory dataFactoryForQuery = dataFactory.getDataFactoryForQuery(query);
      if (dataFactoryForQuery != null) {
        int idx = dataFactory.indexOfByReference(dataFactory);
        dataFactory.set(idx, handleDataSource(reportDefinition, dataFactory));

    private void processAllDataSources(AbstractReportDefinition reportDefinition) {
      CompoundDataFactory dataFactory = CompoundDataFactory.normalize(reportDefinition.getDataFactory());
      final int size = dataFactory.size();
      for (int i = 0; i < size; i++)
        dataFactory.set(i, handleDataSource(reportDefinition, dataFactory.getReference(i)));
    private DataFactory handleDataSource(AbstractReportDefinition reportDefinition, DataFactory dataFactory) {
      return dataFactory;

To deal with the complexities of nested subreports, we use a "StructureVisitor" to traverse the report definition for us. On each report we encounter (the master-report and all sub-reports) we now check for data-factories we are interested in.

There are two ways to retrieve a data-factory shown here:
(1) processAllDataSources - if you want to modify them all or don't know which data-factory is your target. This will iterate over all data-factories stored on that particular report and let you modify it in the "handleDataSource" method.
(2) processSingleDataSource - this method expects the name of a query and will try to locate the first data-factory that claims to be able to handle that query. If your report has many data-factories but you want to modify only a particular one, this method is yours.

Now, enough of standard code - lets solve a real problem.

I want to replace the JNDI definition for reports that have a local file-based HSQL data-source with the proper JNDI data-source. We all know, if you have SQL data-sources and don't want to change your reports whenever your database server changes, you have to use JNDI connections. But what we know is not always what we do, right? 🙂

So lets replace the "handleDataSource" method with one that finds all SQL data-factories. If the data-factory uses the local sample-data, then replaces them with the JNDI reference.

  private DataFactory handleDataSource(AbstractReportDefinition reportDefinition, DataFactory dataFactory) {
    // do whatever you want here.
    if (dataFactory instanceof SimpleSQLReportDataFactory) {
      SimpleSQLReportDataFactory sdf = (SimpleSQLReportDataFactory) dataFactory;
      if (isLocalSampleData(sdf)) {
        JndiConnectionProvider connectionProvider = new JndiConnectionProvider();
    return dataFactory;

  private boolean isLocalSampleData(SimpleSQLReportDataFactory sdf) {
    ConnectionProvider cp = sdf.getConnectionProvider();
    if (cp instanceof DriverConnectionProvider) {
      DriverConnectionProvider jcp = (DriverConnectionProvider) cp;
      if ("org.hsqldb.jdbcDriver".equals(jcp.getDriver()) &&
              "jdbc:hsqldb:file:./sql/sampledata".equalsIgnoreCase(jcp.getUrl())) {
        return true;

    return false;

Run this for all your reports in a directory, and the reports will be patches to never ever use local data-sources again.

By overriding some of the other methods of the AbstractStructureVisitor, a report visitor can easily change report-elements, add expressions or simply report on used features.

To see how a report can edit elements, have a look at the code for the report-pre-processor from the SDK. A report-pre-processor uses a similar approach to inspect and modify to tune reports at runtime.

A good set of samples on how to just inspect and report on the use of certain features, including the use of fields, have a look at the report-designer's inspections. These little helpers also use the AbstractStructureVisitor system to check each element and collect data which they then report to the user.

This entry was posted in Development, Report Designer & Engine on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.