A pitfall of using XML for data storage.

XML is one of the those useful technologies that was designed well, and has a solid foundation for the future. XML encompassing XML, XSD, XSL, XPath etc. XML is a relatively easy technology for a web application developer to get into because most web developers already know CSS and HTML principles, so XSL makes sense. Most web developers also know javascript and DOM, so XPath comes naturally. I must say I am a huge fan of XML and its many implementations.

Every .net developer should know a little bit about XML, because .net applications are configured using XML. .Net has a lot of built in support for working with XML, and serialization is extremely easy and efficient. If you’re a .net developer lately, you’ll find a use for some more advanced XML usage sooner or later.

With all this XML staring at you in the face, one day you’re going to wonder if you even need a database, or will XML file storage suffice, here are some things to watch out for.

  1. XML Storage and Retrieval is great for small operations and small file sizes, so its a great alternative to a database for simple solutions, just ensure at the start of your project, that there won’t be a phase 2 or 3, where suddenly the project grows out of proportion and then WHAM! You’re going to have to sit and figure out how to fix the mess of having large XML files. You might be safer just starting your projects using an express database, if you’re trying to cut down on licensing costs, and you can always scale it later to a full DB product, without rewriting huge sections of your code!
  2. You might think “I will cleverly break my file size down, so that I don’t have this problem”. Yes you can, and you will eventually have to, put in a lot of manual work to get the storage professional enough for Enterprise production usage. If you had gone the DB route, you would not even be thinking at this level.
  3. You’re going to have to write objects to insert/update/delete/select from your XML files. Including Serialization and De serializing from file. – SQL you’re going to write SP’s or SQL queries, and you’re going to need a DAL (Data Access Layer) But you can save yourself a lot of time if you use nHibernate for .net, its free and it works.
  4. You can’t store an unlimited amount of xml files in a directory. Sure technically you can, but you’re in for a few bad surprises.
    1. How will administrators be able to manually browse a directory containing 500k files?
    2. Your C# code surely has to itterate through the whole collection of xml files, are you really intending on having a foreach file in files loop? I wonder just how stable your app is going to be.
  5. Files Lock, yes they do, and working with XML means overcoming all that locking yourself. SQL is much better at handling transaction based access to data. You get all the benefits of decades of research that has gone into our modern RDBMS systems. If you’re going back to a file model, you better hope your app does not scale.
  6. Files Corrupt : The question is “When?” and the answer is “During Serialization when the server (a) suffers from a fatal error (b) total power loss. You might think the odds of this happening are slim, but in my personal experience it happens more often than you might think. You see if you’re using a database, you might essentially loose 1 insert statement but the rest of your database will remain intact, its also highly unlikely you’ll get 1/2 an insert statement, mostly it will fail at once. With XML and your .net serialization techniques, you’re saving an XML file all at once. This means that should anything go wrong, you’re going to corrupt that XML file (Yes the whole schema!) and that my friend is critical data loss. Prepare to (a) Either invest large amounts of time making your app robust enough to recover from corrupted xml schemas and expect some data loss (b) Deal with the full brunt of the problem when things go wrong, and critical data is lost.

You might think “Microsoft use XML, so why can’t I?”. You can, and I suggest you do, just use it for the right reasons, use it to store information that will not scale, like configuration files. Use it for data exchange, possibly logging, if you don’t mind the extra size. Just be very careful you don’t use it as data storage to cut out a database, and end up inducing huge extra costs, when you need to revert back to a conventional database, or write costly workarounds to the pitfalls of XML as a data storage.


    Why Windows Live Sky Drive Sucks

    Windows Live Sky Drive, is another perfect example of an almost useless tool, that could be so much better if only some more additional planning has gone into the product. Don’t hold your breath for Microsoft to actually “enhance this offering”

    1. Sky Drive Limits uploaded file sizes to 50MB
      This seems daft and I am not sure why Microsoft decided to do this. I guess mainly because it allows more clients to connect to the service at once at less cost. But this means you can’t just upload stuff, you gotta come up with all sorts of workarounds for files larger than 50MB
    2. Sky Drive uploads using standard HTTP
      This might be fine if you’re uploading 4-5 files, but since Microsoft actually bother to give you 25GB, hopefully you’re going to upload more… Good luck doing all of this “by hand”, file by file…. This is perhaps the make or break issue and what ultimately means Sky Drive Sucks.
    3. No FTP Access, and no mapping Sky Drive to a drive in Explorer.
      What were the developers thinking, why did they bother calling it SkyDrive, it should have been called, LiveSimpleStore or SkyFileHold, There is nothing about this product that resembles a drive in any way. Please grow brains guys! If you’re offering free online file storage, and 25Gb’s of it, at least make it useable so that people will actually use that free space.

    I really wish Microsoft would  retract SkyDrive from their product range, or enhance it, so that its actually usable.