MySQL with Pentaho Report Designer for those who dont read manuals

Several people in the forums seem to have severe problems setting up MySQL or understanding how MySQL works. They also seem to have severe problems reading the database manuals pointed out in the forum, insisting the problem lays within the Report Designer. So I am going to write up some simple explanations on how to set up MySQL, how to configure the database users for access over the network and how to work around all the little and large weird quirks of that database. And then hopefully I just can point these users to here and never have to deal with it again.*


I do assume that you have MySQL installed and up and running. If not, then go and get PostgreSQL, a database I personally trust.

Connecting to a MySQL database – the basics

All the Pentaho tools and thus the Pentaho Report Designer are Java Applications and therefore use JDBC to access the databases. For other databases, the general rule is: Use the driver that ships with your database and keep the version numbers in sync.

To work with a MySQL database, you will need a suitable MySQL driver. In my experience MySQL drivers have not been the most stable nor most reliable drivers around. At Pentaho, we recommend to use the MySQL JDBC Driver version 5.1.10 as default driver, as this seems to be the most stable driver. Other versions may produce invalid results, and if you want to use any other version, make sure you test it carefully.

If you use the Pentaho Report Designer to create a connection, you will be asked for several properties. To connect to the database directly, select the connection mode “native”. Then enter your hostname or IP-address, port (default port number for MySQL is 3306) and username and password.

For generic JDBC database connections, you need the JDBC-URL, your username and password and optionally the connection properties:

Setting Value

Driver

The driver class to load. The name of the class that implements java.sql.Driver in MySQL Connector/J is com.mysql.jdbc.Driver.
JDBC-URL

The JDBC URL for the MySQL JDBC Driver uses the following format. Items in square brackets ([, ]) are optional.

jdbc:mysql://[host:port]/[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue2]...

If the host name is not specified, it defaults to 127.0.0.1. If the port is not specified, it defaults to 3306, the default port number for MySQL servers.


Username

Your user name as given to you by your database administrator. See below how MySQL translates that information and matches this against the internal user database.

Password

Your password for the connection.

To get the same connection as shown in the dialog screenshot above, you would have to specify the JDBC-URL:

jdbc:mysql://localhost:3306/sampledata

with username “pentaho_user” and “password” as password.

In most cases your database will be running on a different host than the computer you are editing reports or where the BI-Server is installed. When the database is running on a different computer, you will be able to distribute the work-load of answering requests and generating reports a lot better. If both the reporting server and the database share resources like the CPU, memory or disks, there is always the danger that both systems compete for resources and slow each other down unnecessarily.

User management and access control in MySQL

When you connect to a MySQL server, your identity is determined by the host from which you connect and the user name you specify. A MySQL server side user name is the combination of the user name you specified on the connection (“pentaho_user” in the example above) and the host from where you try to connect to the database server.

MySQL users are defined by their username and hostname and generally written in the format ‘username’@’hostname’.

In the example above we connect to the special host address ‘localhost’ (or by its IP address: 127.0.0.1). Only applications running on the same computer can make connections to this interface. By default a server will listening to these connections will see them as originating from “localhost” itself.

So in the above example, the full identity would be ‘pentaho_user’@’localhost’.

MySQL uses a two-stage authentication system. In the first stage it checks, whether you are allowed to connect to the database and validates your username and password. In the second stage MySQL checks whether you have the access rights to see the contents of the tables stored in the database or whether you are allowed to update or delete data.

When we try to establish a connection to the MySQL server, the server will check its local user database for an entry matching the username and password. If there is no user defined for the given hostname, you will see the error message

Host [...] is not allowed to connect to this MySQL server

The error message

Access denied for user 'pentaho_user'@'localhost' (using password: YES)

indicates that either the password is invalid or that this user has not been defined for connections from this host.

If you see the error message

ERROR 1044 (42000): Access denied for user 'pentaho_user'@'localhost' to database 'sampledata'

then you need to grant additional permissions to the user to allow access to the database.

Listing the defined users

MySQL stores all administrative information in a special system database called ‘mysql’. You can connect to this database like any other defined database. To see which users are defined, invoke the following SQL statement via the MySQL console:

SELECT User, Host FROM mysql.user;

MySQL will check your identity (username and host) against the user table. Both the user and the host must exist. If the table contains a host name called “%” this particular entry matches all hosts. If the table contains a empty user name, it matches any user name.

To allow the user ‘pentaho_user’ to connect from any localhost, you would need to have a matching entry for it in the table.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

To allow connections for ‘pentaho_user’ from any host, a entry with ‘%’ as hostname must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

Big fat warning: Even though both entries have the same user name (pentaho_user), they define two entirely different and unrelated identities for the database. It is totally possible and legal for ‘pentaho_user’ from localhost to have access to different databases than the same user for the other hosts. For MySQL these are two different entities.

Connections from Pentaho Report Designer via Connections from PhpMyAdmin or other Server Side Software

The MySQL identity system of treating users from different hosts as completely different entities can cause a lot of headaches, especially if you are not familiar with the underlying workings of the internet.

If your MySQL server is set up to only allow connections from specific hosts, you can run into problems where you can access the database from your web-server but not from within the report designer or vice versa.

Lets look at the most common usage scenarios:

Pentaho Reporting, the Web-Server and MySQL run on the same machine

This is a typical development scenario, where the developer installed all software on his own box for convenience.

+---------------+
| PRD-Host      |
+---------------+
| Web-Server    |
+---------------+
| My-SQL Server |
+---------------+

MySQL must be configured to allow access from ‘localhost’ and all connections in the report must use ‘localhost’ as the hostname for the MySQL connections.

A identity for ‘pentaho_user’@’localhost’ must exist.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | localhost | 
+------------------+-----------+

Pentaho Reporting, the Web-Server and MySQL all use different machines

This is the typical production scenario. Both the database server and the web-server run on different machines for maximum performance. Both machines are usually installed in a server room or data centre.

+----------+          +------------+         +---------------+
| PRD-Host |  --+-->  | Web-Server |  ---->  | My-SQL Server |
+----------+    |     +------------+         +---------------+
                |                                 /|\ 
                +----------------------------------+

Both the developer sitting on the Pentaho Report Designer and the Web-Server need access to the database. While the developer creates the reports he needs access to the database (or a test database with a similar structure than the production database). When the report runs, the web-server must connect to the database as well to execute the reports.

Both the developer PC and the web-server must have entries for the user ‘pentaho_user’ in the user table.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of webserver | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

Alternatively, you could allow access from all hosts. Note that if your database server is not shielded by a firewall, then you may easily open your database to the world. Any self-respecting database administrator may try to strangulate you for suggesting such an configuration.

+------------------+-----------+
| User             | Host      |
+------------------+-----------+
| pentaho_user     | %         | 
+------------------+-----------+

The Web-Server and MySQL share the same machine

You find this set up occasionally in smaller hosting companies or from people who own just a single server. The database and web-server share the same machine, which is hosted in the data centre or server room. The developer runs the report designer on his own desktop machine.

+----------+          +---------------+ 
| PRD-Host |  --+-->  | Web-Server    |
+----------+    |     +---------------+
                +-->  | My-SQL server |
                      +---------------+

The web-server can access the MySQL server via ‘localhost’, while the developer needs to access the MySQL server from his own host outside of the server room.

+------------------+-----------------------------+
| User             | Host                        |
+------------------+-----------------------------+
| pentaho_user     | localhost                   | 
+------------------+-----------------------------+
| pentaho_user     | IP or hostname of PRD-host  | 
+------------------+-----------------------------+

So if you ever wondered why you can access your MySQL server from PhpMyAdmin or similar tools, but not from the Pentaho Report Designer, you should now be able to see the answer and to fix the problem.

You can find more information on this topic in the chapter about MySQL user management in the MySQL manual.

Fine grained access permissions

Even if you are able to connect to the database, you may not be able to query the database. The second stage of MySQL’s security system checks your identity against a set of defined permissions. These permissions control whether you are able to query or modify the data in the database or whether you are allowed to change the table structures or perform other administrative tasks.

Pentaho Reporting only queries the database. The Pentaho reporting tool does not modify data and does not perform any administrative tasks. All MySQL users need to have the “SELECT” permission on all objects you want to access from the reports.

To grant a specific user the permission to select data, use the following statement from the MySQL console.

GRANT SELECT ON database.* TO 'pentaho_user'@'localhost';

MySQL connection properties you should know

To make the JDBC driver work properly in MySQL, you will have to specify some additional connection properties. The driver’s default settings can cause severe problems for the reporting system. These defaults have been defined for historic releases of MySQL and are harmful these days.

Property Definition Comment
useCompression Use zlib compression when communicating with the server (true/false)? Defaults to ‘false’. Useful if you access a remote server through a low bandwidth connection. Disable for production systems.
passwordCharacterEncoding What character encoding is used for passwords? Leaving this set to the default value (null), uses the platform character set, which works for ISO8859_1 (i.e. “latin1”) passwords. For passwords in other character encodings, the encoding will have to be specified with this property, as it’s not possible for the driver to auto-detect this. If you are not using an all ASCII password and work outside the area of Western European languages, knowing about this is a life-saver.
characterEncoding If ‘useUnicode’ is set to true (which is the default), what Java character encoding should the driver use when dealing with strings? (defaults is to ‘autodetect’). If the encoding cannot be determined, then an exception will be raised. Sometimes MySQL server do detect the character set correctly and therefore return garbage when querying tables with non-ascii data. In this case this property can be used to override the autodetection of character sets.
functionsNeverReturnBlobs Should the driver always treat data from functions returning BLOBs as Strings – specifically to work around dubious metadata returned by the server for GROUP BY clauses? The MySQL JDBC-Driver returns string results of functions in SQL statements as byte-arrays. If you use any functions in SELECT statements, set this property to “true” or you will see garbage in the results.
emptyStringsConvertToZero Should the driver allow conversions from empty string fields to numeric values of ‘0’? MySQL treats empty strings or values as zero for numeric columns by default. If you need to see the values as they are stored in the database without helpful corrections from the JDBC driver, then set this flag to false.
zeroDateTimeBehavior What should happen when the driver encounters DATETIME values that are composed entirely of zeros (used by MySQL to represent invalid dates)? Valid values are “exception”, “round” and “convertToNull”. Use this if your select statement fails with an exception and you have a zero-date in it. Set it to ‘convertToNull’ to come to a more reasonable result.
serverTimezone Override detection/mapping of timezone. Used when timezone from server doesn’t map to Java timezone Use this if your dates returned by the database seem to be offset by some hours. Also see ‘useGmtMillisForDatetimes’, ‘useJDBCCompliantTimezoneShift’, ‘useLegacyDatetimeCode’ and ‘useTimezone’ for more time zone related functions.
holdResultsOpenOverStatementClose Should the driver leave the result sets open on Statement.close() (enabling violates JDBC specification) This can be a very helpful property for reports with subreports. It allows the Pentaho Reporting engine to process several server-side resultsets in parallel. Without this setting, the reporting engine will fully buffer result-sets in memory instead, which increases the memory consumption of the reporting system.

Further readings

If you have made it this far, then let me tell you one big hint: MySQL comes with a nice administration tool called My SQL Workbench which allows you to administrate the server without having to go through the command line.

* Yeah. At least I can fake hope, even when I don’t feel it.

This entry was posted in Basic Topic, Report Designer & Engine, Tech-Tips on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.

11 thoughts on “MySQL with Pentaho Report Designer for those who dont read manuals

  1. mark

    Hello,
    Thanks for the info. I am able to log in to my server and preview the data from PRD. I can also publish the report to the server. I am on MAC OSX desktop and the server is runnning Ubuntu 10.04 server with Mysql on the same server.
    When I wanted to view the report on the server, I get “Report validation failed” error. The log file first says:
    12:08:32,293 ERROR [SolutionEngine] e2f639d1-7e22-11e0-b786-7171a41b431c:SOLUTION-ENGINE:session-region-list.xaction: SolutionEngine.ERROR_0007 – Action sequence execution failed

    Then I get :

    com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:

    ** BEGIN NESTED EXCEPTION **

    java.net.ConnectException
    MESSAGE: Connection timed out

    I do not know what to check here. I can connect to the server and create reports using the same connections which is pointing the localhost db.

    Any hints will be greatly appreciated.

  2. mark

    I use pentaho 3.8 server, i believe it is the latest. I was suspecting server address issue that’s why I used server’s real ip address, localhost, 127.0.0.1 as FQDN in the web.xml. None of them resolved this issue.
    Thanks much
    Mark

  3. Kurtis

    This is not the right error… if you notice it is referring to an .xaction, this error is not related to the .prpt failing. You should try turning on the debugging on the bi-server for the reporting engine’s data factory to see more error messages.

    Add this to the log4j.xml in the /tomcat/webapps/pentaho/WEB-INF/classes/

  4. Kurtis

    here is the class you need to add and set to DEBUG…

    org.pentaho.reporting.engine.classic.core.modules.misc.datafactory

  5. mark

    Thanks for the info.
    I use “Pentaho Platform Engine Core 3.8.0-stable.45256” and matching PRD. I added:

    after restarting the server, I get :
    **********
    12:38:41,276 WARN [AxisService] Unable to generate EPR for the transport : http
    Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
    12:38:41,669 ERROR [SolutionEngine] c3032bb7-814b-11e0-bba6-cda6684882e7:SOLUTION-ENGINE:global-department-list.xaction: SolutionEngine.ERROR_0007 – Action sequence execution failed
    ************

    Then after submitting the report, I get the same error:
    *********
    12:43:30,182 ERROR [SolutionEngine] 6f0c7608-814c-11e0-bba6-cda6684882e7:SOLUTION-ENGINE:session-region-list.xaction: SolutionEngine.ERROR_0007 – Action sequence execution failed
    12:44:01,340 WARN [SimpleSQLReportDataFactory] Unable to perform cache preparation
    com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:

    ** BEGIN NESTED EXCEPTION **

    java.net.ConnectException
    MESSAGE: Connection timed out
    **********

    Thanks much
    Mark

  6. mark

    I added the code you suggested under Limit categories. I do not know it would make a difference or not.

  7. mark

    Google is cutting off my comments for some reason. I got the same error message after report submission. The server starts up with following:
    Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
    12:38:41,276 WARN [AxisService] Unable to generate EPR for the transport : http
    Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
    12:38:41,669 ERROR [SolutionEngine] c3032bb7-814b-11e0-bba6-cda6684882e7:SOLUTION-ENGINE:global-department-list.xaction: SolutionEngine.ERROR_0007 – Action sequence execution failed

  8. SREENATH'S

    Hi Thomas,
    Thanks for the information on my request. But, which you said here are completely i did to configure & now i am working with reporting tool on mysql server. Just because of no permissions for the user i can’t able to connect to mysql database.

    Thanks
    sreenath.

  9. Ed

    Hi,

    I’m using PENTAHO BI SERVER 4.5 and PRD 3.9.1 on CENTOS 7.0, Mysql 5.5.37.

    One server has Mysql database, another has Pentaho bi server and reports that programmers published.

    When we run reports under this platform, reports run very slow.
    If we set pentaho bi server on same machine as mysql server, reports run fast.

    What am I missing?

    Thanks in advance.

Comments are closed.