REST and Data Encoding

1 Why Does REST Lack a Data Encoding?
2 Web Services Encoding Chaos
3 And the Winner Is!
3.1 Domain Specific XML
3.2 Pre-packaged Data Encodings
3.3 Alternatives to XML
4 The Best Choice For You

People often tell me, REST would be much more popular if it only had a standard data encoding. They mean that it should have a standard way to represent simple types like integers, floating point numbers and strings, or complex types like structures, arrays and graphs.

REST is an architectural style, not a standard or protocol. So it could not be updated to insist on an encoding any more than "object oriented programming" could be updated to insist on a particular syntax. I don't have the authority to define standards for REST any more than Kent Beck can define standards for "object oriented programming". I'm just the messenger and the message is just a way of building applications.

1 Why Does REST Lack a Data Encoding?

The core instantiation of REST in the world of written standards is HTTP. But it should already be obvious to you that it would be both impossible and harmful for HTTP to have a predefined object encoding. In 1990, HTTP was only used to carry plain text files and HTML files. By 2000 it had burgeoned to carrying formats that had not even been invented in 1990: RSS, PNG, X3D, etc. Today it carries formats that were not even invented only a few years ago. I don't know when you are reading this but I am confident that statement is true nevertheless.

How did this come about? Roy Fielding has some slides on the design of protocols and the Web Architecture. There is some amazing stuff in those slides but the part that is relevant to us is:


Orthogonal protocols deserve orthogonal specifications: If one protocol uses another as data, it must not restrict the content of that data other than as defined by that protocol, including future compatible revisions of that protocol. A specification that defines two orthogonal protocols (including data formats) must be split into two specifications, since otherwise the independent evolution of those protocols will be hindered. The result is simpler protocols able to evolve independently over time enables system to continue operation through gradual and fragmented change

Before XML-RPC and SOAP became popular, it was common to define XML encodings with the idea being that you would just pass them around over native HTTP. Consider LDO and WDDX. But the idea of binding the protocol to the encoding is very seductive because it gives you one clear marketing message. You don't say: "I implement RSS over HTTP". You say: "I implement XML-RPC" and that implies everything from method calling syntax to marshalling. Nobody cares if this limits the longevity of the protocol because people are trying to get their jobs done today, not (in general) building systems that will last for years.

Every protocol chooses its target audience and makes its choices. The XML-RPC and SOAP choices may have been appropriate for those protocols. But REST and HTTP were designed to maximize the extensibility, longevity, scalability and growth of the Web. If some pundit had force a markup encoding on HTTP in 1990 then we would not even be able to talk about using it for Web Services-like stuff in the twenty-first century.

It is easy to see this playing out in real life. The REST-based HTTP protocol is more than a decade old and continue to grow in scope of applicability. Meanwhile, SOAP is just a few years old and has already had to be paired with a variety of mutually incompatible packaging schemes to work around the fact that it is only tuned for XML payloads. And others have abandoned XML-RPC for some projects for the same reason (which is not to say that there are circumstances where XML-RPC has all you need). And XML-RPC has no way to directly deliver arbitrary XML data, so even fans of XML-RPC use HTTP to deliver stuff like RSS and RDF. Those protocols are optimized for features other than extensibility, flexibility and longevity. Primarily, they choose simplicity.

The Web is very analogous to a file system. This is especially true if you use WebDAV. What file system have you ever heard of that has a "standard" file format (beyond text/plain)? REST is a mechanism for manipulating data. There is no "right" object encoding for data in general.

2 Web Services Encoding Chaos

It might seem that we could narrow our scope and ask about a good encoding for Web Services-like things. Sure, the human-facing Web will use a variety of random multimedia stuff, but the machine-to-machine Web will be more disciplined. After all, if there is no standard, there will be chaos.

I still don't think it would be wise or possible to canonize an encoding. If I had said that the SOAP Encoding was the One True Way in 1999, wouldn't I have looked foolish when REST Web Services became extremely popular for shipping RSS and WSDL in 2002. And perhaps it will be Notation-3 or YAML in 2004. And probably something I've never heard of by 2006.

But more important: the right encoding for REST is whatever the best encoding is for the kind of data you need to work with. In 2002 that probably means RSS for web change notifications (but perhaps ICE). SVG for vector graphics (but don't forget VML). Java class files for mobile code (but don't forget JavaScript and E). PNG for bitmap graphics (but we still need GIF and JPEG). RDF for semantically defined information (but RDF has a variety of syntaxes). WSDL for web service definitions (but what about DAML-S?). It is a complicated world out there. Canonizing encodings would imply that I understood your domain well enough to know the right data model and data representation for you.

3 And the Winner Is!

Okay, you might argue, but the human-facing Web at least has HTML which is a "default data encoding." But HTML is only a default for one kind of data: hyperlinked textual documents. The Web delivers much more: graphics, sound, movies, channels etc. Nevertheless, if I had to nominate an "HTML for Web Services" I would be pretty safe nominating XML for most taks for rough the next ten years.

The problem is: XML is so generic! Choosing XML gets you only a little bit of the way down the path towards figuring out what actually goes on the wire. This leaves you with two major categories of choices.

3.1 Domain Specific XML

The first approach to XML is to start from scratch and define a vocabulary specific to your problem domain. This is frightening at first but more and more people are learning how to do it.

Some will complain that domain-specific XML vocabularies are a pain to parse and manipulate. Perhaps. Perhaps the pain comes from XML's very generality and to a certain extent from its human-readability. In that case, any solution would tend to reduce these benefits. On the other hand, perhaps XML's tree model is just gratuitously inconvenient for simple processing.

If it is, that problem should and must be solved separately from any kind of standardization around REST or HTTP. It would be totally illogical if it were really easy to work with structured data streaming in across the network and very difficult to work with data on the file system because somehow we had fixed XML's (perceived) problems in a protocol rather than at the source.

Perhaps generic XML is only a pain to work with in your environment because of the APIs you are using. Some of the newer data binding APIs make working with highly structured, schema-defined XML pretty seamless and invisible. There are probably extensibility issues in working with these APIs and there is the issue of defining schemas to be considered. And Perl programmers have good reports about XML::Simple. There may also be issues here, however. I won't go into the strengths and weaknesses of these in detail. In my opinion, this is still an area for research.

3.2 Pre-packaged Data Encodings

Another choice for handling XML: use a pre-packaged "data encoding." These have the advantage of having some kind of well-defined mapping into a programmatic API that is more convenient than a DOM.

You could take a step back in time and look at some encodings that were not defined as part of protocols: WDDX, LDO, Apple's PLISTs etc.

You could use one of those somewhat odd XML vocabularies that are not quite a hard-coded list of element types nor are they as generic as free-form XML. That means that they try to strike balance between flexibility/extensibilty and ease of processing in a non-tree model. Typically the underlying model is a graph and they just establish conventions on the representation of graphs as XML. The two main ones are the "SOAP encoding" and RDF. The former is simpler, the latter is more powerful. Neither feels like "normal" XML and so they feel kind of syntactically clumsy but if you can get past that you'll probably be okay. But make sure that your data consumers can also get past it.

RDF has the advantage that it was never defined as part of a protocol, so you won't have to fight your languages' API, which you might for SOAP encoding (depends on the language). Once again we come back to Fielding's separation of principles. To see how simple an RDF API can be, read about Jena's. SOAP Encoding has the advantage that you can tell your boss you're using SOAP. Plus there are some people who fear and loathe RDF.

You could also decide to extract the XML-RPC encoding out of the XML-RPC protocol. You will likely have the same API issues as with the SOAP encoding. And you may have to fight the fact that it is not extensible or Unicode compatible. Plus it doesn't support the encoding of graphs. But if none of those are a problem for you, then use it. It's simple.

I can't predict which of these techniques will be winners or losers in the long term and I certainly won't have much success if I try to anoint one or the other. Embrace the chaos. It is the engine or progress. This is the chaos that gave rise to the Web.

3.3 Alternatives to XML

You could also abandon XML and use something like YAML or s-expressions or n-triples or MIME-RPC. The downside here is you lose compatibility with XML parsers, schema languages, XSLT etc. But if you want all of the benefits of XML, you will probably have to accept all of the costs too. There is no free lunch.

I agree that between the XML and non-XML options there are too many choices overall. This will be worked out over time, probably in the XML or semantic web worlds. For now, a generic XML encoding is probably the most straight-forward choice.

4 The Best Choice For You

Whichever of these techniques you choose, you have a choice about whether to embed the format/vocabulary/encoding in the definition of your REST-based app or separate it out as its own "thing" and whether to define it unilaterally or with others in your industry or community. I would strongly suggest that you separate it out and that you work with others to define a standard if that is at all possible.

REST is about standardization. A very important kind of standardization is vertical, around a specific problem domain. You can build upon the horizontal technologies I describe above, but in the end, if you want interoperability, you have to sit down with other people who have an incentive to share information with you and work out a vertical vocabulary, whether that is an XML Schema, RDF Schema, a YAML Schema, BNF or prose document describing some syntax. It doesn't matter (for the sake of this discussion) what you use as long as it is a documented standard.

Developing and documenting these standards is expensive, difficult and annoyingly touchy-feely. It is also inevitable and necessary. Blogging could not be as interoperably syndicated as it is if everyone agreed to use XML-RPC but nobody invented RSS. It would not be possible to exchange calendar information if everyone agreed to use XML-RPC or SOAP but nobody invented iCal. Other important examples include XHTML, FOAF, WSDL, SVG and DAML etc. etc. etc.

I think you should thik carefully before trying to do an end-run around standards by merely publishing a convenient "API" through an RPC mechanism. In the blog world, there are a variety of incompatible APIs such that working with them all is difficult. The best interoperability occurred when people sat down and defined an XML vocabulary: RSS. Even with the drama around it, RSS is a much more powerful force for interoperability than the sum of the APIs to the services.

Once you've done this standardization, REST and HTTP will not get in your way because of the principle of separation of concerns. In fact, it will become somewhat of a no-brainer to use REST. GET an RSS to learn about what is happening on another site. POST an RSS to tell another site what is happening on yours. GET a vCard to learn about someone else's contact information. POST a vCard to inform someone else about yours. GET a WSDL to learn about the SOAP or HTTP interface to a web service. POST a WSDL to inform a registry about your service's web service interface. A new killer app is born for REST every time someone sits down to learn about XML (or RDF or YAML or S-Expressions or ...) and invent a ground-breaking new vocabulary.

This is why HTTP is the leading protocol for manipulating pre-existing formats like RSS and vCards and will continue to be for the foreseeable future. As XML, RDF, YAML and other formats evolve and grow, HTTP will always support them naturally and directly. A new killer app for REST is born every day.


HTML rendition created using stylesheets by Wendell Piez of Mulberry Technologies.

[up]|[home]