Category Archives: Rants

XML is not text – and Notepad is no XML editor

When the W3C and the big software vendors introduced XML, their marketing departments came up with a lot of bold claims. XML would be a human friendly format, heck, they even claimed it to be human editable.

On a strictly technical side, all of these claims are true.

On a real life side, these claims are simply wrong.

Only very few XML formats are really human readable, most of them describing configuration files. These XML structures usually have been designed by the developer to be human friendly. But merely using XML does not automatically make everything friendly. The only group that considers XML files friendly seem to be software developers.

Test it: open your Open-Office or XML-enabled MS-Office and look at the documents these beasts create. Or listen to a Web-Service conversation, which is just friendly XML. Do you still think XML is human readable and friendly?

But how about the Human Editable aspect. Well, the term alone is sneaky, as human editable does not mean that much. When I was a kid, savegame files were technically ‘human editable’ for me on a regular base. But I would never have claimed that savegame-files are human-editable at all.

XML files are human hackable, nothing more.

Without deep knowledge about the binary level of these files, it is quite easy to transform a valid XML file into a huge chunk of garbage.

XML is in fact a binary format. There are special cases, where a plain text-file editor can be used to alter the contents of an XML file without breaking it – but these cases are an exception, not the rule.

Do not use a text editor for editing XML files.

Like so many file formats, XML is a layered fileformat.

  1. Binary data levelAll content on your disks is just bits and bytes. XML is no exception here. When working with XML content, applications translate the binary content into a internal textual representation (usually Unicode).

    XML files have the option to declare how a application should translate between the binary and character-data level by specifying the encoding in the XML header.

    >

    Common text editors do not interpret this definition and therefore are likely to break the file on a binary level. A file that was stored in UTF-8 encoding and which was later edited in a 8-Bit encoding like ISO-8859-1 will not be readable afterwards.

    Unless you use an text editor that is aware of the XML header, your chances are high to render your files unusable or to introduce subtle errors like corrupted text contents.

  2. XML-Content levelOnce the XML-parser translated the binary content into its own internal text-representation, it will start to interpret the XML syntax and will produce a high-level view on the XML-elements and attributes described by the character-data stream.

    When not using an editor that is aware of the XML-syntax, you have to validate the syntactical correctness of the file by yourself. Luckily XML parsers are really good at spotting these errors as well, so if the file is syntactically incorrect, your parser will likely tell you so by rejecting the file.

  3. Application levelXML only describes the grammar, means: how content is organized on a physical level. It tells you that there are Elements and Attributes – but the XML standard itself does not tell you anything about the meaning and which elements and attributes are allowed on what position.

    The application-level document structure is commonly described in DTD files (DTD = Document Type Declaration) or XML-Schema files. A good XML editor can understand these files and can validate the document to some point. Although this validation cannot guarantee that your XML content will make sense for the application, at least it can tell you whether your content fits the application’s declared expectations.

    Of course, plain text editors will not provide such validation services.

Well, telling everyone not to use Notepad for editing XML files is one thing. But people use text editors because they obviously have a need to edit XML files.

To be suitable for my working style, an XML editor has to fulfill a couple of prequesites.

  • The XML editor must be able to understand the XML-encoding hint in the header (thus solving the binary translation in a reliable way).
  • Of course, it should be a real editor that allows me to write the XML file like source-code (notepad mode, instead of restricting me to a structural tree-view only).
  • If the editor does not come with syntax-highlighting, then I dont want to use it either.
  • The editor must be able to check the XML file for syntactical correctness (ie, all elements closed etc).
  • Being able to validate the XML file agains at least DTDs and XML-Schemas and providing auto-complete capabilities is a huge plus point.

Finding commercial editors for these requirements is relatively easy. But OpenSource? Hmm. thats difficult. In the open-source world, everyone seems to work with plain text-editors (which do a great job on messing up your encodings if you work with files edited on other systems).

Eclipse seems to have an XML editor as same-plugin, but you have to compile that yourself. (Dont ask me why that IDE does not come with an XML editor by default. Maybe they did not want to spoil the business for the commercial vendors. Or maybe this is just honesty of the Eclipse developers, to show you that Eclipse should not be used in Enterprise settings.)

In the open-source world I have never found a xml-editor that is easy to use – the few I stumbled across were so horrible that I did not even bother to work long enough with it to want to save a file.

On Windows, there are a couple of editors that seem to fulfill at least the basic requirements:

As I cant believe that Windows is more than just a gaming platform, I barely use XML editors on Windows at all.

Personally, I work with IntelliJ IDEA, which has great XML-editing capabilities, which cover everything I need on a daily base. (I don’t believe in stuff like XSLT or XPath/XQuery.) It costs less than some of the commercial XML-Only editors (like XMLSpy, for instance) and has a nice Java-IDE included as well.

Comparing Barcelona with Orlando

Same:

– The airport area looks the same. (Airports itself probably always look the same everywhere). The same mix of swamps, large free area that eventually turns into storage buildings or other low-density industrial area in a couple of years. The same large ads.
– Traffic jams. What use is a highway system, if everyone (except the motorcycles) is crawling?
– Climate. well, not the exactly the same. Barcelona is located directly at the sea, so its not *that* humid as Orlando.
– Air conditioning. Everywhere. And even large buildings dont have central ones, every window has its own small air conditioner attached.
– Food*. Not sure how Columbus made it to America with such food*. Well, I’m not sure how the americans survive with their food* either.
– Being offended by naked skin. You cant enter the Cathedral of Barcelona if you are a scrupelous woman that walks freely around with naked shoulders. Terrible such behaviour, isn’t it? And next woman want to vote. Seriously, even in Bavaria (which is strict** in religious topics) one would not be able to survive (policitally, not physically) with such an idea (ban naked skin, not voting***).
– The Sun is the same. I made some scientific experiments and can confirm: It produces the same kind of sun burn in both cities.

Different:

– Its older. The whole inner city of Barcelona smells of history.
– Its slower (especially in the rush hour). They take their time to build proper things. One cathedral is being built since 1882. They may finish it in 2020. Or later.
– Its has a city center. Orlando is a large assortation of villages, the city center is so small that you could miss it when you walk through it (I missed it twice. I was surprised once I found it. Now I dont miss it anymore.)

Tomorrow the OpenOffice Conference will start in the old university. The building itself feels more like a museum – old paintings and beautiful gardens. Being there makes you feel as if you traveled back in time 600 years (except for the existence of ring-tones and the non-existence of the spanish inquisition).

* The term food is used loosely here. In some countries chefs may be burned (or worse: fed such things) for comming up with such creations. But visiting either city is always a good time to loose some extra pounds of body weight.

** Strict, in comparison to the other German states. Not as strict as Vatican city, of course.

*** In Bavaria, there is no sense in inventing voting. There is just one party that counts, so it more or less comes down to a Yes or No thing. And Nos dont count either.