Alternatives to QUERY

Many people (including me) have independently come up with the idea of an HTTP method which would be used for safe, idempotent operations that involved the input and output of large amounts of data. This has been called QUERY. Now Tim Berners-Lee has taken up the cause and would like to see the method standardized. I've since reconsidered my support for QUERY and I think that adding it to the Web would be a mistake.

Let's examine the case for QUERY: The argument is that caches, servers and other bits of software do not expect URIs to be arbitrarily long. This is a path to exposing buffer overflows and truncating user data accidentally. Also, the body of an HTTP request is usually handled in a streaming fashion whereas the URI is often buffered into a single string. These are in fact important issues and must be handled. POST is not considered an ideal method because POST results are not cached by default. The first part of this document will discuss the current partial solutions. Then I will describe why QUERY is not an acceptable full solution. Finally I will describe a proposal for a more architecturally sound equivalent to QUERY.

Current partial solutions

There are three partial solutions to these problem. The first is merely to use a POST. A POST result may state that the entity is cacheable. So if cacheing is the main worry then the problem is already solved although this is obviously somewhat of an abuse of POST because it violates the "only-GET axiom." Nevertheless, how is it better to invent a new method that always violates the axiom rather than re-using an existing method that sometimes violates it. (Mark Nottingham disagrees with my interpretation of POST, so this solution may not be strictly available in HTTP/1.1. More research needed.)

A second partial solution is just to use GET and try your luck. There are no documented limits for URI lengths, though implementations are allowed to reject long ones. Maybe your URIs will never get big enough to break anything. There's an obvious reason that I said this is only a "partial" solution!

A second solution is described by Mark Baker:

I'm not a big fan of a new method for submitting complex queries, nor
even extending GET to support a body if that body is in any way used
to change the meaning of the URI(*).  A query really should be
identifiable by a URI.  IMHO, if urlencoded syntax is insufficiently
expressive for some particular query then you could do this;

POST /query HTTP/1.1
Host: example.org
Content-Type; application/x-complex-query

[query goes here]

which would return;

HTTP/1.1 201 Created
Location: http://example.org/queries/12371232

A GET on that URI would return the query results.

Of course this requires two messages and it generates stuff on the server that needs to be removed later (timeout or DELETE or whatever). On the other hand, it creates a short URI that can be used to refer to this query result for as long as it survives which can save the client from sending the large input information over and over again.

The point is that this problem is not urgent and we should not rush into a solution. The Web has survived so far without it and it is clear that HTTP was always supposed to support cacheable POSTs. Thanks to these partial solutions, there is no fire.

The only-GET Web Axiom

Tim Berners-Lee states "In HTTP, anything which does not have side-effects must use GET". "This means that for people implementing systems in which users request information and execute operations using forms, when the form simply requests information it must result in a GET operation. Indeed this is very much to be favored over a post operation because the result of a GET operation has a URI and may be leaked to, for example, may be put into a bookmark. This violates the axoim of universality above."

The "axiom of universality" says: "Any resource of significance should be given a URI: This means that no information which has any significance and persistence should be made available in a way that one cannot refer to it with a URI."

The reasons for this are as rock-solid as they have ever been. The fundamental basis of the Web as a universal information resource is the idea that anything can be linked to anything. This requires "anything" to have a URI. The idea of QUERY flies in the face of the "only-GET" axiom.

Let's examine the case for QUERY architecturally (as opposed to pragmatically). The idea is that sometimes you have a "lot of data" and encoding it as a URI is inconvenient. As you can see, this is not really an architectural issue at all. It is just as valid to have a resource like: "http://www.somesite.com translated into French" (which would unquestionably use GET) and to have one like "<the complete works of William Shakespeare literally included> translated into French". And in fact there is no good way when you are setting up a service to know whether the end-user is going to enter a little bit of data, which should go into a GET or a lot which should go into a QUERY. So QUERY is not architecturally necessary nor useful.

QUERY is actually architecturally very dangerous. People already want to use it as a short-cut to avoid defining URI-friendly forms for their query languages. For instance some people in the XForms and SOAP worlds are excited about the idea of a QUERY because it would allow them to get away without defining URI-friendly encodings for forms data and SOAP parameters respectively. This trend could drastically reduces the availability of addressable query information on the Web. The web consists primarily of documents, links and URIs. When people choose to use other strategies for querying they are reducing the size, scope and usefulness of the Web.

QUERY has another problem: if the boundary between short URIs and "long" QUERY URIs is unclear then what about this same boundary for DELETE, POST and PUT? The fact is that it is poor architecture that HTTP has no generalized solution for really long URIs.

A Future Solution

There is a real problem with GET and URIs. Today's solutions are not entirely adequate. QUERY is not a good idea because it violates a fundamental Web architectural principle and makes some forms of information no longer addressable.

What we need is a method that addresses the same namespace as GET. In other words it must be semantically just like GET except that it supports really long URIs. You could use this new method with short URIs and have the Web treat it as if you had just used GET. But you could use it with long URIs and nothing would break. In fact, it would be an upgraded GET and the other one could just fade away over the next several centuries (or not, it doesn't matter).

This NEW-GET would have a header that would say: "the URI-line overflowed, get the rest of it from the body." Let's call this the URI-Overflow header. This same header makes perfect sense for DELETE. For PUT and POST we will have to think a little bit harder because those methods already have bodies. We may just have to have a two-part body where the first part is the rest of the URI.

Still, NEW-GET is first and most important. Remember that the defining characteristics of NEW-GET are that it has exactly the same semantics as GET but it can syntactically handle really long URIs. The former charateristic is what distinguishes it from QUERY.

The wonderful thing about NEW-GET is that it already exists. It is called M-GET and it is defined as part of the HTTP Extension Framework. This framework allows the addition of new headers to HTTP methods so that the receiver knows that understanding the new header is very important. All that remains to be done is the specification of an appropriate header.

Alternately, the header could be added to standard GET in a backwards incompatible future version of HTTP.