Several people in the forums seem to have severe problems setting up MySQL or understanding how MySQL works. They also seem to have severe problems reading the database manuals pointed out in the forum, insisting the problem lays within the Report Designer. So I am going to write up some simple explanations on how to set up MySQL, how to configure the database users for access over the network and how to work around all the little and large weird quirks of that database. And then hopefully I just can point these users to here and never have to deal with it again.*
I do assume that you have MySQL installed and up and running. If not, then go and get PostgreSQL, a database I personally trust.
Connecting to a MySQL database – the basics
All the Pentaho tools and thus the Pentaho Report Designer are Java Applications and therefore use JDBC to access the databases. For other databases, the general rule is: Use the driver that ships with your database and keep the version numbers in sync.
To work with a MySQL database, you will need a suitable MySQL driver. In my experience MySQL drivers have not been the most stable nor most reliable drivers around. At Pentaho, we recommend to use the MySQL JDBC Driver version 5.1.10 as default driver, as this seems to be the most stable driver. Other versions may produce invalid results, and if you want to use any other version, make sure you test it carefully.
If you use the Pentaho Report Designer to create a connection, you will be asked for several properties. To connect to the database directly, select the connection mode “native”. Then enter your hostname or IP-address, port (default port number for MySQL is 3306) and username and password.
For generic JDBC database connections, you need the JDBC-URL, your username and password and optionally the connection properties:
Setting |
Value |
Driver
|
The driver class to load. The name of the class that implements java.sql.Driver in MySQL Connector/J is com.mysql.jdbc.Driver .
|
JDBC-URL |
The JDBC URL for the MySQL JDBC Driver uses the following format. Items in square brackets ([, ]) are optional.
jdbc:mysql://[host:port]/[database][?propertyName1][=propertyValue1][&propertyName2][=propertyValue2]...
If the host name is not specified, it defaults to 127.0.0.1. If the port is not specified, it defaults to 3306, the default port number for MySQL servers.
|
Username
|
Your user name as given to you by your database administrator. See below how MySQL translates that information and matches this against the internal user database.
|
Password
|
Your password for the connection.
|
To get the same connection as shown in the dialog screenshot above, you would have to specify the JDBC-URL:
jdbc:mysql://localhost:3306/sampledata
with username “pentaho_user” and “password” as password.
In most cases your database will be running on a different host than the computer you are editing reports or where the BI-Server is installed. When the database is running on a different computer, you will be able to distribute the work-load of answering requests and generating reports a lot better. If both the reporting server and the database share resources like the CPU, memory or disks, there is always the danger that both systems compete for resources and slow each other down unnecessarily.
User management and access control in MySQL
When you connect to a MySQL server, your identity is determined by the host from which you connect and the user name you specify. A MySQL server side user name is the combination of the user name you specified on the connection (“pentaho_user” in the example above) and the host from where you try to connect to the database server.
MySQL users are defined by their username and hostname and generally written in the format ‘username’@’hostname’.
In the example above we connect to the special host address ‘localhost’ (or by its IP address: 127.0.0.1). Only applications running on the same computer can make connections to this interface. By default a server will listening to these connections will see them as originating from “localhost” itself.
So in the above example, the full identity would be ‘pentaho_user’@’localhost’.
MySQL uses a two-stage authentication system. In the first stage it checks, whether you are allowed to connect to the database and validates your username and password. In the second stage MySQL checks whether you have the access rights to see the contents of the tables stored in the database or whether you are allowed to update or delete data.
When we try to establish a connection to the MySQL server, the server will check its local user database for an entry matching the username and password. If there is no user defined for the given hostname, you will see the error message
Host [...] is not allowed to connect to this MySQL server
The error message
Access denied for user 'pentaho_user'@'localhost' (using password: YES)
indicates that either the password is invalid or that this user has not been defined for connections from this host.
If you see the error message
ERROR 1044 (42000): Access denied for user 'pentaho_user'@'localhost' to database 'sampledata'
then you need to grant additional permissions to the user to allow access to the database.
Listing the defined users
MySQL stores all administrative information in a special system database called ‘mysql’. You can connect to this database like any other defined database. To see which users are defined, invoke the following SQL statement via the MySQL console:
SELECT User, Host FROM mysql.user;
MySQL will check your identity (username and host) against the user table. Both the user and the host must exist. If the table contains a host name called “%” this particular entry matches all hosts. If the table contains a empty user name, it matches any user name.
To allow the user ‘pentaho_user’ to connect from any localhost, you would need to have a matching entry for it in the table.
+------------------+-----------+
| User | Host |
+------------------+-----------+
| pentaho_user | localhost |
+------------------+-----------+
To allow connections for ‘pentaho_user’ from any host, a entry with ‘%’ as hostname must exist.
+------------------+-----------+
| User | Host |
+------------------+-----------+
| pentaho_user | % |
+------------------+-----------+
Big fat warning: Even though both entries have the same user name (pentaho_user), they define two entirely different and unrelated identities for the database. It is totally possible and legal for ‘pentaho_user’ from localhost to have access to different databases than the same user for the other hosts. For MySQL these are two different entities.
Connections from Pentaho Report Designer via Connections from PhpMyAdmin or other Server Side Software
The MySQL identity system of treating users from different hosts as completely different entities can cause a lot of headaches, especially if you are not familiar with the underlying workings of the internet.
If your MySQL server is set up to only allow connections from specific hosts, you can run into problems where you can access the database from your web-server but not from within the report designer or vice versa.
Lets look at the most common usage scenarios:
Pentaho Reporting, the Web-Server and MySQL run on the same machine
This is a typical development scenario, where the developer installed all software on his own box for convenience.
+---------------+
| PRD-Host |
+---------------+
| Web-Server |
+---------------+
| My-SQL Server |
+---------------+
MySQL must be configured to allow access from ‘localhost’ and all connections in the report must use ‘localhost’ as the hostname for the MySQL connections.
A identity for ‘pentaho_user’@’localhost’ must exist.
+------------------+-----------+
| User | Host |
+------------------+-----------+
| pentaho_user | localhost |
+------------------+-----------+
Pentaho Reporting, the Web-Server and MySQL all use different machines
This is the typical production scenario. Both the database server and the web-server run on different machines for maximum performance. Both machines are usually installed in a server room or data centre.
+----------+ +------------+ +---------------+
| PRD-Host | --+--> | Web-Server | ----> | My-SQL Server |
+----------+ | +------------+ +---------------+
| /|\
+----------------------------------+
Both the developer sitting on the Pentaho Report Designer and the Web-Server need access to the database. While the developer creates the reports he needs access to the database (or a test database with a similar structure than the production database). When the report runs, the web-server must connect to the database as well to execute the reports.
Both the developer PC and the web-server must have entries for the user ‘pentaho_user’ in the user table.
+------------------+-----------------------------+
| User | Host |
+------------------+-----------------------------+
| pentaho_user | IP or hostname of webserver |
+------------------+-----------------------------+
| pentaho_user | IP or hostname of PRD-host |
+------------------+-----------------------------+
Alternatively, you could allow access from all hosts. Note that if your database server is not shielded by a firewall, then you may easily open your database to the world. Any self-respecting database administrator may try to strangulate you for suggesting such an configuration.
+------------------+-----------+
| User | Host |
+------------------+-----------+
| pentaho_user | % |
+------------------+-----------+
The Web-Server and MySQL share the same machine
You find this set up occasionally in smaller hosting companies or from people who own just a single server. The database and web-server share the same machine, which is hosted in the data centre or server room. The developer runs the report designer on his own desktop machine.
+----------+ +---------------+
| PRD-Host | --+--> | Web-Server |
+----------+ | +---------------+
+--> | My-SQL server |
+---------------+
The web-server can access the MySQL server via ‘localhost’, while the developer needs to access the MySQL server from his own host outside of the server room.
+------------------+-----------------------------+
| User | Host |
+------------------+-----------------------------+
| pentaho_user | localhost |
+------------------+-----------------------------+
| pentaho_user | IP or hostname of PRD-host |
+------------------+-----------------------------+
So if you ever wondered why you can access your MySQL server from PhpMyAdmin or similar tools, but not from the Pentaho Report Designer, you should now be able to see the answer and to fix the problem.
You can find more information on this topic in the chapter about MySQL user management in the MySQL manual.
Fine grained access permissions
Even if you are able to connect to the database, you may not be able to query the database. The second stage of MySQL’s security system checks your identity against a set of defined permissions. These permissions control whether you are able to query or modify the data in the database or whether you are allowed to change the table structures or perform other administrative tasks.
Pentaho Reporting only queries the database. The Pentaho reporting tool does not modify data and does not perform any administrative tasks. All MySQL users need to have the “SELECT” permission on all objects you want to access from the reports.
To grant a specific user the permission to select data, use the following statement from the MySQL console.
GRANT SELECT ON database.* TO 'pentaho_user'@'localhost';
MySQL connection properties you should know
To make the JDBC driver work properly in MySQL, you will have to specify some additional connection properties. The driver’s default settings can cause severe problems for the reporting system. These defaults have been defined for historic releases of MySQL and are harmful these days.
Property |
Definition |
Comment |
useCompression |
Use zlib compression when communicating with the server (true/false)? Defaults to ‘false’. |
Useful if you access a remote server through a low bandwidth connection. Disable for production systems. |
passwordCharacterEncoding |
What character encoding is used for passwords? Leaving this set to the default value (null), uses the platform character set, which works for ISO8859_1 (i.e. “latin1”) passwords. For passwords in other character encodings, the encoding will have to be specified with this property, as it’s not possible for the driver to auto-detect this. |
If you are not using an all ASCII password and work outside the area of Western European languages, knowing about this is a life-saver. |
characterEncoding |
If ‘useUnicode’ is set to true (which is the default), what Java character encoding should the driver use when dealing with strings? (defaults is to ‘autodetect’). If the encoding cannot be determined, then an exception will be raised. |
Sometimes MySQL server do detect the character set correctly and therefore return garbage when querying tables with non-ascii data. In this case this property can be used to override the autodetection of character sets. |
functionsNeverReturnBlobs |
Should the driver always treat data from functions returning BLOBs as Strings – specifically to work around dubious metadata returned by the server for GROUP BY clauses? |
The MySQL JDBC-Driver returns string results of functions in SQL statements as byte-arrays. If you use any functions in SELECT statements, set this property to “true” or you will see garbage in the results. |
emptyStringsConvertToZero |
Should the driver allow conversions from empty string fields to numeric values of ‘0’? |
MySQL treats empty strings or values as zero for numeric columns by default. If you need to see the values as they are stored in the database without helpful corrections from the JDBC driver, then set this flag to false. |
zeroDateTimeBehavior |
What should happen when the driver encounters DATETIME values that are composed entirely of zeros (used by MySQL to represent invalid dates)? Valid values are “exception”, “round” and “convertToNull”. |
Use this if your select statement fails with an exception and you have a zero-date in it. Set it to ‘convertToNull’ to come to a more reasonable result. |
serverTimezone |
Override detection/mapping of timezone. Used when timezone from server doesn’t map to Java timezone |
Use this if your dates returned by the database seem to be offset by some hours. Also see ‘useGmtMillisForDatetimes’, ‘useJDBCCompliantTimezoneShift’, ‘useLegacyDatetimeCode’ and ‘useTimezone’ for more time zone related functions. |
holdResultsOpenOverStatementClose |
Should the driver leave the result sets open on Statement.close() (enabling violates JDBC specification) |
This can be a very helpful property for reports with subreports. It allows the Pentaho Reporting engine to process several server-side resultsets in parallel. Without this setting, the reporting engine will fully buffer result-sets in memory instead, which increases the memory consumption of the reporting system. |
Further readings
If you have made it this far, then let me tell you one big hint: MySQL comes with a nice administration tool called My SQL Workbench which allows you to administrate the server without having to go through the command line.
* Yeah. At least I can fake hope, even when I don’t feel it.