XML is not text – and Notepad is no XML editor

When the W3C and the big software vendors introduced XML, their marketing departments came up with a lot of bold claims. XML would be a human friendly format, heck, they even claimed it to be human editable.

On a strictly technical side, all of these claims are true.

On a real life side, these claims are simply wrong.

Only very few XML formats are really human readable, most of them describing configuration files. These XML structures usually have been designed by the developer to be human friendly. But merely using XML does not automatically make everything friendly. The only group that considers XML files friendly seem to be software developers.

Test it: open your Open-Office or XML-enabled MS-Office and look at the documents these beasts create. Or listen to a Web-Service conversation, which is just friendly XML. Do you still think XML is human readable and friendly?

But how about the Human Editable aspect. Well, the term alone is sneaky, as human editable does not mean that much. When I was a kid, savegame files were technically ‘human editable’ for me on a regular base. But I would never have claimed that savegame-files are human-editable at all.

XML files are human hackable, nothing more.

Without deep knowledge about the binary level of these files, it is quite easy to transform a valid XML file into a huge chunk of garbage.

XML is in fact a binary format. There are special cases, where a plain text-file editor can be used to alter the contents of an XML file without breaking it – but these cases are an exception, not the rule.

Do not use a text editor for editing XML files.

Like so many file formats, XML is a layered fileformat.

  1. Binary data levelAll content on your disks is just bits and bytes. XML is no exception here. When working with XML content, applications translate the binary content into a internal textual representation (usually Unicode).

    XML files have the option to declare how a application should translate between the binary and character-data level by specifying the encoding in the XML header.

    >

    Common text editors do not interpret this definition and therefore are likely to break the file on a binary level. A file that was stored in UTF-8 encoding and which was later edited in a 8-Bit encoding like ISO-8859-1 will not be readable afterwards.

    Unless you use an text editor that is aware of the XML header, your chances are high to render your files unusable or to introduce subtle errors like corrupted text contents.

  2. XML-Content levelOnce the XML-parser translated the binary content into its own internal text-representation, it will start to interpret the XML syntax and will produce a high-level view on the XML-elements and attributes described by the character-data stream.

    When not using an editor that is aware of the XML-syntax, you have to validate the syntactical correctness of the file by yourself. Luckily XML parsers are really good at spotting these errors as well, so if the file is syntactically incorrect, your parser will likely tell you so by rejecting the file.

  3. Application levelXML only describes the grammar, means: how content is organized on a physical level. It tells you that there are Elements and Attributes – but the XML standard itself does not tell you anything about the meaning and which elements and attributes are allowed on what position.

    The application-level document structure is commonly described in DTD files (DTD = Document Type Declaration) or XML-Schema files. A good XML editor can understand these files and can validate the document to some point. Although this validation cannot guarantee that your XML content will make sense for the application, at least it can tell you whether your content fits the application’s declared expectations.

    Of course, plain text editors will not provide such validation services.

Well, telling everyone not to use Notepad for editing XML files is one thing. But people use text editors because they obviously have a need to edit XML files.

To be suitable for my working style, an XML editor has to fulfill a couple of prequesites.

  • The XML editor must be able to understand the XML-encoding hint in the header (thus solving the binary translation in a reliable way).
  • Of course, it should be a real editor that allows me to write the XML file like source-code (notepad mode, instead of restricting me to a structural tree-view only).
  • If the editor does not come with syntax-highlighting, then I dont want to use it either.
  • The editor must be able to check the XML file for syntactical correctness (ie, all elements closed etc).
  • Being able to validate the XML file agains at least DTDs and XML-Schemas and providing auto-complete capabilities is a huge plus point.

Finding commercial editors for these requirements is relatively easy. But OpenSource? Hmm. thats difficult. In the open-source world, everyone seems to work with plain text-editors (which do a great job on messing up your encodings if you work with files edited on other systems).

Eclipse seems to have an XML editor as same-plugin, but you have to compile that yourself. (Dont ask me why that IDE does not come with an XML editor by default. Maybe they did not want to spoil the business for the commercial vendors. Or maybe this is just honesty of the Eclipse developers, to show you that Eclipse should not be used in Enterprise settings.)

In the open-source world I have never found a xml-editor that is easy to use – the few I stumbled across were so horrible that I did not even bother to work long enough with it to want to save a file.

On Windows, there are a couple of editors that seem to fulfill at least the basic requirements:

As I cant believe that Windows is more than just a gaming platform, I barely use XML editors on Windows at all.

Personally, I work with IntelliJ IDEA, which has great XML-editing capabilities, which cover everything I need on a daily base. (I don’t believe in stuff like XSLT or XPath/XQuery.) It costs less than some of the commercial XML-Only editors (like XMLSpy, for instance) and has a nice Java-IDE included as well.

This entry was posted in Development, Rants on by .
Thomas

About Thomas

After working as all-hands guy and lead developer on Pentaho Reporting for over an decade, I have learned a thing or two about report generation, layouting and general BI practices. I have witnessed the remarkable growth of Pentaho Reporting from a small niche product to a enterprise class Business Intelligence product. This blog documents my own perspective on Pentaho Reporting's development process and our our steps towards upcoming releases.