Doing Java-Doc is more than just slapping comments on the code

Whenever a project reaches the point where we may be tempted to call it a stable version, we enter the stage of updating the source code documentation. Our engine has been known for its comprehensive JavaDoc documentation in the past (yeah, the user documentation is a whole diffent chapter). Even OHLOH noticed that we have a good share of documentation.

But how do you recognize good documentation? First, it must be complete, of course. Second, it must be comprehensible. And third: It must be useful to the reader. That really doesn’t sound complicated, doesn’t it? But then again: The documentation of most projects (including my projects from time to time) just sucks. So maybe it is not that easy as it sounds.

Providing complete documentation is not just a matter of slapping comments to each and every method and class. Completeness also means, that it covers all the pitfalls and allows the reader to understand how things are supposed to work. A complete documentation should give enough information to allow the reader to reimplement everything from scratch. In my book, the documentation of the JDK itself is a great example what completeness could mean.

Writing comprehensible documentation requires to cover two aspects. First, the documentation should be readable. Grammar and orthography should march the basic level of the language. Errors can always happen – in code and in the documentation. But if the text is so full of errors that a common reader can no longer guess what the text meant, then it might be better to have no documentation at all. I know, for non-native speakers, this can be a challenge at first, but given our modern times, it is not hard to let a native speaker proof-read the texts. The documentation should contain complete sentences and should not try to be cryptic (in the sense of letting a hard-core mathematican explain hard-core mathematics to the unexpecting first-grade students). The documentation should either explain concepts that may be unfamiliar to the target audience or should contain links to further readings that explains these new concepts. Especially when the code uses or implements a certain standard, actually referencing that standard can greatly clear things up. As a positive side-effect, the documentation can skip all those parts that have been covered in the standard already. (Isn’t that the best reason to embrace standards in the first place?)

And last but not least: Being useful! We all learned in school, that each text should be written with a certain target audience in mind. The source-code documentation, like any other text, therefore should be written for the assumed user of our code.

If we look at our classic-engine, we have two classes of recipients, The frontend part of the code, the part that is used to define reports and to start the report processing is mainly used by users of the library. Users (for our purposes here) want to utilize the engine to generate reports, but they do not expand the functionality of the engine itself. Once a developer starts to hunt bugs or adds or changes functionality of the engine, the ‘user’ turns into an ‘implementor’.

Both groups have fundamental different needs. New users tend to be unfamiliar with the reporting terms we use. They do not know the concepts we use, and to be honest, unless they see how these things help them to get their job done faster they couldn’t care less about these concepts. For these users, our documentation should be shift and should introduce the basic concepts gently. Implementers, on the other hand, need (or even want) all the dirty details. They should be familiar with the basics already and are (so I hope) not scared to dig into details of the technologies and standards involved.

For us this means that we have to solve to conflicting aims. We must provide easy access without flooding the users with technical terms (and therefore scaring them away) while providing enough details to the implementors, so that they can get their job done without unnecessary code crawling. How we solve that? Front-end classes get documented for users, and all the backend classes get geeky developer documentation. The engine already shields the users from the backend classes – there is no direct way to invoke them from outside the report process and there should be no access path left that would let a user get through to that code.

As we expect that implementers are able to read code, the backend documentation will be an agumentation to the source-code. It therefore (contrary to all theories of sourcecode documentation) does not document the contract, but explains how and why we did what the code says there. The engine’s backend does not form an interface that we expect to be implemented by some independent vendor. There’s just the code that does the work – and documentation that explains how the code is supposed to do the work.

This entry was posted in Development on by .

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.