Why XML Security is Broken
                           ==========================
                     Peter Gutmann, pgut001@cs.auckland.ac.nz
                                   October 2004

Introduction
------------

This writeup was motivated by the following exchange on a mailing list:

  >>I have some questions related to XML-Dsig:
  >
  >Argghh!! Run away!

  A near-universal reaction.

So why is "Run away!" a near-universal reaction to XML-Dsig (and XML security
in general)?  Because it doesn't work, that's why.  The problem with XML
security can be traced back to two fundamental causes:

  1. XML is an inherently unstable and therefore unsignable data format.
  XML-Dsig attempts to fix this via canonicalistion rules, but they don't
  really work.

  2. The use of an "If it isn't XML, it's crap" design approach that lead to
  the rejection of conventional, proven designs in an attempt to prove that
  XML was more flexible than existing stuff.

These problems are covered in more detail below, along with a simple solution
to the problem that's already in use by some XML users.

XML is an inherently unsignable data format
-------------------------------------------

XML signatures are an attempt to hammer signatures onto inherently un-signable
data.  Even at the most basic syntax level (ignoring for now the equally
problematic XML semantic features), you need to handle text-canonicalisation
for whitespace, line endings, character-set encoding, word wrapping, escape
sequences, and so on.  Even this relatively straightforward process was so
incredibly difficult that the X.509 world abandoned it years ago by mutual
unspoken agreement because it was just Too Hard to do.  So X.509 spent about
the first ten of its twenty-odd years trying to get this right and failed, and
yet the X.509 canonicalisation rules are vastly simpler than the XML ones.

Much more worrying though is the fact that at the semantic level XML, like MS
Word, consists of highly dynamic content, but about two orders of magnitude
more complex than Word.  With XML you have to deal with XSLT (transformations
that handle tree construction, format control, pattern selection, and other
issues), XPath selection, the fact that the data can be affected (often
drastically) by external forces such as style sheets, schemas, and DTDs, XML
namespace declarations and namespace attributes, and about a million other
things, none of which anyone can quite agree on how to handle, mostly because
there is no way to handle them.  "Secure XML", the definitive reference on the
topic, spends fully half of its 500-odd pages trying to come to grips with XML
and its canonicalistion problems, without really ever resolving things.  In
fact it reads more like a 250-page essay on how not to do things than a
solution.  The PGP and S/MIME canonicalistion rules in contrast mostly fit
into a single sentence: "Grab the input blob and sign it as is".

Because of these problems with XML and XML canonicalisation, you have an
inherently unstable medium that you're supposed to base your business
transactions on.  Imagine how this would end up in court: "Your honour,
although the plaintiff claims we signed this, we have 39 differently-
canonicalised forms that show we didn't, 18 different namespace types that
prove the plaintiff is in fact at fault and not us, 7 applications of DTDs
that show beyond a doubt that they owe us the amount they're claiming, and
four schemas whose use will clearly show that we have rights to their house
and car as well".

The plaintiff then gets to explain XML DTDs, and why their particular one
should be accepted and the 17 the defendant is presenting shouldn't, to a 60-
year-old judge with an arts degree and a jury of people whose VCRs blink
12:00.

(For an especially fun abuse of the inability to canonicalise XML, make your
 product the first one to market in a given area, advertise it widely in the
 appropriate trade journals as being fully standards-compliant, get the
 canonicalisation wrong, and threaten all of your competitors with prosecution
 under the DMCA if they as much as download your software to figure out what
 on earth you're doing with your XML.  With a little effort this can be even
 more lucrative than a USPTO-assisted patent shakedown).

(Incidentally, both S/MIME and PGP can be coerced into a mode of operation
 where they also have a subset of the problems of XMLdsig.  If you use a
 detached signature rather than combining signature and data into a single PGP
 or S/MIME entity then external applications are free to mangle the data in
 any way they want when it's in transit, with the result that the signature
 check fails.  Trying to canonicalise the content into an unmangled (or
 mangle- resistant) form has been an ongoing battle, particularly with PGP
 signed email using detached signatures, where intermediate MTAs and (more
 usually) MUAs can do things like stripping trailing whitespace which have no
 user- visible effect on the data but cause the signature check to fail.  The
 simplest solution to this is "don't do that, then" - send your signed data as
 a single S/MIME or PGP entity rather than breaking it up into two parts, one
 of which can be modified in transit).

Welcome to all things XML, where if it's not XML, it's crap
-----------------------------------------------------------

There are a number of well-established existing ways for securing content, the
two best-known being S/MIME and PGP.  If you go beyond the bit-bagging methods
used, PGP and S/MIME are practically identical at the structural level.  In
fact if it wasn't for the fact that some of the bit-bagging fields were
cryptographically secured, you could rewrite PGP into S/MIME and vice versa
just by changing the packaging format.  More details on this, and on the
following message-format discussion, can be found in "Performance
Characteristics of Application-level Security Protocols" available from my
home page.

The similarity between the two isn't because they tried to copy each other.
In fact precisely the opposite, there was some animosity between the two camps
at the time the standards were being created.  The reason why they're
structurally identical is because there's really only one (sensible) way to
encapsulate data cryptographically.  For signed data this is:

  signature hash algorithm indicator;
  data;
  signature;

and for encrypted data it's:

  recipient/key-exchange information;
  encrypted data;

This is necessary to allow straightforward one-pass processing.  Consider for
example signed data arranged as:

  data;
  signature;

In this case it's not known which hash algorithm is required to compute the
signature, so it's necessary to process and buffer all of the (arbitrarily
large) data to locate the signature, extract the hash algorithm identifier
from that, and go back and re-process all of the data.  Only then is it
possible to actually check the signature.  Similarly, the encrypted data has
the recipient/key exchange information at the start so that decryption keys
can be set up before processing the encrypted data.  Putting it anywhere else
produces the same problem as rearranging the signed data fields.

Since there's really only one logical way to do these things, and since there
are a large number of well-established, field-proven toolkits out there to do
PGP and S/MIME, the obvious approach to XML security would be to define <PGP>
</PGP> and <SMIME></SMIME> tags, and break for tea and biscuits.  Anyone who
needed to secure XML could grab their favourite security toolkit (including
all manner of open source/free ones if they felt the need) and be done with
it.

Unfortunately, this approach was heresy to the XML security folks because,
well, PGP and S/MIME aren't XML.  So they had to reinvent the wheel in XML.
This lead to a second problem: Since there's only one logical way to structure
secured data, it'd be obvious to anyone that all they'd done was reivent the
wheel in XML.  To avoid this problem as well, they reinvented the wheel in
XML, but made it square to avoid accusations that they'd just reinvented the
wheel.  So with XML security it is indeed possible to do things like:

  data;
  signature hash algorithm indicator;
  signature;

and:

  encrypted data;
  recipient/key-exchange information;

and all manner of other horrible things.  Consider a case where 'data' is a
4GB message being streamed through a system.  Without the one-pass processing
capability, you have to buffer the entire message somewhere until you get to
the trailer which tells you what to do with it (since XML applications tend to
consume CPU and memory like Homer Simpson consumes doughnuts, this frequently
isn't noticed beyond the general complaints that XML is very slow to work
with).  Even worse, if you need to process these messages on devices without
the storage to buffer the entire message in memory, there's no way to do it.
A real-world example of this is medical equipment that secures/checks large
medical images on the fly as they're streamed over a hospital network.

In addition to the processing problem, XML security gives you the flexibilty 
to shoot yourself in the foot in a dozen different ways without even knowing 
it.  For example there are applications that sign the document header (rather 
than the document itself), because XML gives you the flexibility to do that. 
There's at least one application that signs an empty string, because XML gives 
you the flexibility to do that.  I don't even want to count the number of 
homebrew (and broken) key exchange mechanisms I've seen where messages contain 
embedded keys before or after the secured payload, because it's so much more 
convenient to do it that way.  The PGP and S/MIME approach is "Take a blob, 
sign it/encrypt it".  The XML security approach is to hand the user a large 
pile of toothpicks and a tube of glue and hope they'll get it right, while 
loudly proclaiming how much more flexible and powerful XML is than other 
approaches.  The crypto operations are performed and the signatures verify, 
but nothing's actually being secured.  Brad Hill (see the reference further 
down) has a great example of this where he takes a signed XML purchase order 
and, using XML tricks, swaps a $1.50 box of pencils for a $2,500 laptop 
without invalidating the signature, something that the 
sign-everything-as-a-blob approach of S/MIME and PGP would never allow.

What's even worse is that the XML-ueber-alles approach makes it impossible to
separate the security component from XML.  That is, in order to implement a
security toolkit for XML, you need to implement a complete XML processing
system.  This is akin to requiring anyone creating a (non-XML) security
toolkit to implement a complete MTA/MUA and web server capable of handling
SMTP, MIME, HTTP, and HTML, as part of the toolkit.  There are reasons why no
standard security toolkit does things this way.  These are the same reasons
why no standard security toolkit can support XML security, requiring
expensive, usually proprietary XML security solutions that force users into
whatever XML-processing system the toolkit vendor has chosen.

As an example of this inflexibility, if you want to use a standard security
toolkit (and I'll use my own cryptlib as an example because I'm most familiar
with that, insert your favourite alternative here), you can use it as a static
library, a shared library, a Windows DLL, a COM object, from scripting
languages like Python, to implement a web server, a raw SSL or SSH tunnel,
S/MIME, raw encrypted data, encrypted files via uucp or FTP, and so on ad
infinitum.  In contrast with XML if you don't like the fact that the toolkit
vendor has chosen to use (say) the SAX way of looking at the world when you're
working with Xerces or DOM XML or LibXML or XML .NET or AElfred or Electric
XML or Xparse or MSXML..., well... tough.  It's impossible to create something
that's simply a security component that you can plug in wherever you need it,
because XML security is inseparable from the underlying XML processing system.
This breaks the basic principle of modularity, and ensures that XML security
toolkits will be created either by XML vendors with little knowledge of
security or security vendors with little knowledge of XML, a recipe for
disaster.

In contrast a non-XML security toolkit can be plugged in wherever you need it
to do whatever you want to do with it.  This is why a large number of
protocols that need security simply defer to an existing mechanism/toolkit:
SCEP uses S/MIME, SFTP uses SSH, SIP again uses S/MIME, and so on.  Since the
application isn't tied to the underlying security mechanism, it's possible to
use any standard security toolkit from any established security vendor to do
the job.

(There exists a subset of XML folks who appear to agree with this view.  For
 example RFC 3923, "End-to-End Signing and Object Encryption for the
 Extensible Messaging and Presence Protocol (XMPP)" uses S/MIME rather than
 XML security to provide its security).

The solution
------------

The solution to the problem is to do what was rejected by the XML security
folks at the beginning: Define <PGP></PGP> and <SMIME></SMIME> tags, tell
anyone who wants to do XML security to grab any existing, well-established,
field-proven security toolkit, and leave it at that.  This avoids all of the
problems mentioned above, at the (rather slight) cost of having to admit that
XML may not actually be the solution to all the world's problems.

This was exactly the approach taken by the Jabber folks in the Jabber security
mechanisms "End-to-End Signing and Object Encryption for the Extensible
Messaging and Presence Protocol (XMPP)", RFC 3923.  This mechanism relies on
MIME body-parts to handle the signing and encryption, which has the advantage
that it's easily implementable using existing off-the-shelf software, and
works with anything that talks MIME.

The Oasis folks have come to a similar conclusion, using a basic "sign-the-
blob" approach in their SAML signing.  The reference for this is still subject
to change, but a Google search for the title "SAMLv2.0 HTTP POST 'SimpleSign'
Binding" should find the current version of the document.

A slightly different approach has been proposed by Johannes Ernst in his
XML-RSig design, where RSig stands for Really simple Signatures (a bit of a
tautology really, since *anything* is simpler than XML-DSig..  You can read
about it at
http://netmesh.info/jernst/Technical/really-simple-xml-signatures.html.  In
XML-RSig you pick a node to sign, take everything from the first character of
the start tag to the last character of the end tag, and sign this is a blob
using your favourite technique (OpenPGP, S/MIME, whatever).  Finally, you
insert a new node <rsig:signature> as a child of the node whose signature it
is.  The same thing applies for encryption.  Simple, easy to implement, and
exactly what XML-DSig should have been in the first place.

Other comments on this issue
----------------------------

James Clark has some interesting thoughts on the same thing from an XML-
centric point of view (rather than the security-centric one presented here)
at http://blog.jclark.com/2007/10/bytes-not-infosets.html.  The W3C is also
aware of some of these issues and is working to address them, although the
approach seems to be to apply a series of patches rather than a
reconsideration of the overall approach:
http://www.w3.org/2007/xmlsec/ws/report.

Brad Hill from iSEC Partners has done a lot of work in the area of XML (in-)
security and found all sorts of problems.  You can get slides and associated
writeups for some of his talks at
http://www.isecpartners.com/files/iSEC_HILL_AttackingXMLSecurity_bh07.pdf,
http://www.isecpartners.com/files/XMLDSIG_Command_Injection.pdf, and
http://www.isecpartners.com/files/iSEC_HILL_AttackingXMLSecurity_Handout.pdf.

A footnote from a non-XML security toolkit author
-------------------------------------------------

I'm the author of a security toolkit (cryptlib, mentioned above) that
implements pretty much every Internet security protocol there is except XML
security.  I've tried to support XML security, I really have, but after
repeated attempts to figure out how to do this I just can't do it without
incorporating a complete XML processing interface into cryptlib.  I can do PGP
standalone, I can do S/MIME standalone, I can do SSH standalone, I can do
SSL/TLS standalone, I can do <insert long further list of protocols>
standalone, but there's simply no way to support XML security in a general-
purpose toolkit.  Even if there was, as a security person I don't know whether
I could ship a toolkit that would allow developers to shoot themselves in the
foot a dozen different ways while thinking that they're securing their data.

(If anyone has further comments or other war stories that I can add here
 (there were some that I couldn't add because they would have identified the
 original source), please get in touch).