In the upcoming version 5.1, Pentaho Reporting will now ship with (for this version) experimental support for printing bidirectional text. Bi-Directional text processing enables us to print both Arabic, Hebrew and other non-Latin languages.
Support for Arabic text was on our backlog for a decade now. Thanks to the work done by Nortal and Marian Androne in particular, we now have a new text processing sub-system that relies on the JDK’s TextLayout class to process complex text. The text-layout class handles all the line-breaking calculations, while some additional helper code around it adds stronger rich-text features, including embedded images, to the mix.
The new text layouting system is a large chunk of code and came comparatively late in the development process for version 5.1. We have maybe a month of development time left until we are supposed to finalize the next release. The AWT itself is platform dependent and next to impossible to unit-test properly. The results of any font and text processing are heavily dependent on your JDK version and vendor and the operating system and its configuration and available fonts.
To properly test the new code, we will need a considerable amount of time, as that testing will be either manual or will require a whole new approach to testing to insulate ourselves from the platform specifics.
Oh, and no one at Pentaho seems to speak Arabic or Hebrew to validate the somewhat critical correctness of the BiDi-Text processing. As glyphs flow together and – for the untrained eye – insignificant dots and lines can alter the meaning of the text that is printed, I personally do not feel confident to vouch for the correctness of the print without more tests.
The text might be fine, or a misspelling might start a (national/corporate/marital) war. So let’s play it safe for now.
How to enable Arabic Text support.
You can enable Arabic text processing on a per-report basis by setting the attribute “common::complex-text” to “true”. If you want to enable this globally, add the configuration setting “org.pentaho.reporting.engine.classic.core.layout.fontrenderer.ComplexTextLayout=true” to the “classic-engine.properties” file in the root of your classpath.
Once the complex text processing is enabled, you can use a couple of new styles in your reports. (These styles can be set regardless of the complex-text processing setting – but they will have no effect if the old text processor is used.)
When using Arabic or any other non-Latin text, it is critical to NEVER EVER EVER! use any of the built-in fonts (“Serif”, “SansSerif”, “Monospaced”, “Dialog”) or your export to PDF will produce invalid output. The PDF specification does not support non-Latin text for these fonts and will fail silently.
Apparently, in 1985, when the PDF specifications were made, no one could have foreseen that Arabic people would want to use computers for printing texts in their native language. 😉
Once complex text processing is enabled, you can control the default flow of the text via a new style property named “text-direction”. This style is inherited, so to define a preference for Right-To-Left text processing for the whole report, it is sufficient to define this style only once on the master-report object.
By using the text-layout class, we now also gained the ability to break text within words. The new inheritable style property “word-break” allows you to control this feature. If not defined, this defaults to “true” (breaks only at word-boundaries), just like in the previous versions.
Please help us to squash all remaining bugs in this new feature by giving it a try. And if you happen to be a native or fluent speaker of a Right-to-Left language, we would love to hear whether we print everything correctly.
Please use our JIRA system to report bugs, or our forum for general questions and feedback. Thank you!
Ohhhh, That’s great.
We are waiting for new release.
And ready to test in Farsi and Arabic
I heavily suspect that Farsi will work fine too – as the normal JDK text components support it as well.
We are currently finishing off the last bits for the release and we should reach the code-freeze within the next two to three weeks. At that point, there is only a finite amount of time allocated for final last-minute tests and then the release gets out. So what you see on http://ci.pentaho.com is feature complete and the only changes that will happen are bug fixes.
I have made the changes as per the blog, but the report still prints arabic characters as question marks when exporting to PDF, I am using pentaho 5.3.