A pitfall of using XML for data storage.

XML is one of the those useful technologies that was designed well, and has a solid foundation for the future. XML encompassing XML, XSD, XSL, XPath etc. XML is a relatively easy technology for a web application developer to get into because most web developers already know CSS and HTML principles, so XSL makes sense. Most web developers also know javascript and DOM, so XPath comes naturally. I must say I am a huge fan of XML and its many implementations.

Every .net developer should know a little bit about XML, because .net applications are configured using XML. .Net has a lot of built in support for working with XML, and serialization is extremely easy and efficient. If you’re a .net developer lately, you’ll find a use for some more advanced XML usage sooner or later.

With all this XML staring at you in the face, one day you’re going to wonder if you even need a database, or will XML file storage suffice, here are some things to watch out for.

  1. XML Storage and Retrieval is great for small operations and small file sizes, so its a great alternative to a database for simple solutions, just ensure at the start of your project, that there won’t be a phase 2 or 3, where suddenly the project grows out of proportion and then WHAM! You’re going to have to sit and figure out how to fix the mess of having large XML files. You might be safer just starting your projects using an express database, if you’re trying to cut down on licensing costs, and you can always scale it later to a full DB product, without rewriting huge sections of your code!
  2. You might think “I will cleverly break my file size down, so that I don’t have this problem”. Yes you can, and you will eventually have to, put in a lot of manual work to get the storage professional enough for Enterprise production usage. If you had gone the DB route, you would not even be thinking at this level.
  3. You’re going to have to write objects to insert/update/delete/select from your XML files. Including Serialization and De serializing from file. – SQL you’re going to write SP’s or SQL queries, and you’re going to need a DAL (Data Access Layer) But you can save yourself a lot of time if you use nHibernate for .net, its free and it works.
  4. You can’t store an unlimited amount of xml files in a directory. Sure technically you can, but you’re in for a few bad surprises.
    1. How will administrators be able to manually browse a directory containing 500k files?
    2. Your C# code surely has to itterate through the whole collection of xml files, are you really intending on having a foreach file in files loop? I wonder just how stable your app is going to be.
  5. Files Lock, yes they do, and working with XML means overcoming all that locking yourself. SQL is much better at handling transaction based access to data. You get all the benefits of decades of research that has gone into our modern RDBMS systems. If you’re going back to a file model, you better hope your app does not scale.
  6. Files Corrupt : The question is “When?” and the answer is “During Serialization when the server (a) suffers from a fatal error (b) total power loss. You might think the odds of this happening are slim, but in my personal experience it happens more often than you might think. You see if you’re using a database, you might essentially loose 1 insert statement but the rest of your database will remain intact, its also highly unlikely you’ll get 1/2 an insert statement, mostly it will fail at once. With XML and your .net serialization techniques, you’re saving an XML file all at once. This means that should anything go wrong, you’re going to corrupt that XML file (Yes the whole schema!) and that my friend is critical data loss. Prepare to (a) Either invest large amounts of time making your app robust enough to recover from corrupted xml schemas and expect some data loss (b) Deal with the full brunt of the problem when things go wrong, and critical data is lost.

You might think “Microsoft use XML, so why can’t I?”. You can, and I suggest you do, just use it for the right reasons, use it to store information that will not scale, like configuration files. Use it for data exchange, possibly logging, if you don’t mind the extra size. Just be very careful you don’t use it as data storage to cut out a database, and end up inducing huge extra costs, when you need to revert back to a conventional database, or write costly workarounds to the pitfalls of XML as a data storage.

    One thought on “A pitfall of using XML for data storage.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s