Consider two strings. H, an HTTP URI:
http://www.prescod.net/foo
U is a UUID URI:
uuid:6A21402E-9641-11D8-A237-0003939B6294
Poor H is abused by the world. It commits the sin of being location-dependent which means that it is not appropriate for peer-to-peer applications, for offline applications, etc. We are also told that H will also cease functioning in the face of network outage. H is not as reliable or scalable as U.
Wrong. All wrong. H has been defended by luminaries like Tim Berners-Lee and Roy Fielding but its reputation remains sullied. Undaunted, I wish to add my voice to the defense. H is no more or less location-dependent than U. Perhaps by using some actual code I can prove that the distinction in people's minds are only in their minds and do not affect the code write.
In any reasonable programming language, computers do not look inside of strings to extract semantics until they are told to. For instance, in Python I can create string objects for both H and U and the string objects will have exactly the same type and properties:
>>> h = "http://www.prescod.net/foo" >>> u = "uuid:6A21402E-9641-11D8-A237-0003939B6294" >>> type(h) == type(u) # same types True >>> dir(h) == dir(u) # same methods and attributes True
The string does not know whether it is a name or an address. If there is a language for which this is not true, that language is terribly, horribly broken and I would appreciate if you could bring it to my attention so I could denigrate it and denounce its heretical inventor.
Given a string, in Python or other languages, I can choose to treat it as either a name or an address. As we described before, the string itself does not know whether it is a name or an address. As a programmer, I can essentially assert that it is a name by restrcting my usage of it to name-like things, or assert that it is an address, by using it as one.
We tend to do exactly two things with names: compare them and look them up in information repositories (which can be thought of as an optimization for merely comparing to each name in the repository until you hit a match).
>>> def is_the_URI_I_am_looking_for(given_uri, target_uri): ... return given_uri == target_uri ... >>> def find_info_that_goes_with_uri(given_uri, info_repository): ... return info_repository[given_uri]
Let's see if we can apply these two functions to H and U:
>>> is_the_URI_I_am_looking_for(h, "http://www.prescod.net/foo")
True
>>> is_the_URI_I_am_looking_for(u, "http://www.prescod.net/foo")
False
>>> is_the_URI_I_am_looking_for(h, "uuid:6A21402E-9641-11D8-A237-0003939B6294")
False
>>> is_the_URI_I_am_looking_for(u, "uuid:6A21402E-9641-11D8-A237-0003939B6294")
True
>>> info_repository = {"http://www.prescod.net/foo": "Must be H!",
... "uuid:6A21402E-9641-11D8-A237-0003939B6294": "Must be U!"}
>>> find_info_that_goes_with_uri(h, info_repository)
'Must be H!'
>>> find_info_that_goes_with_uri(u, info_repository)
'Must be U!'
H and U have exactly the same properties when it comes to these two properties. So they are equally powerful as names. Now let's look at them as addresses. Addresses are dereferenced using networking protocols. Let's try Python's urllib:
>>> urllib.urlopen(h) <addinfourl at 4312720 whose fp = <socket._fileobject object at 0x55490>>
That's Python's way of saying that it is ready to read from the object. Now let's try U:
>>> urllib.urlopen(u) IOError: [Errno url error] unknown url type: 'uuid'
Hmmm. urllib doesn't like the "UUID" URI type. We could blame this on Python's urllib but then we would have to report to Python's developers what we want them to do with UUID URIs? What protocol should they use to dereference them? There are protocols out there but they are deployed on a minority of all desktops. On the other 95% of the desktops, the UUID URI would fail to do anything whatsoever.
There are deep reasons that these protocols are so hard to deploy: the UUID namespace is totally flat which makes it very hard to know who to ask for information about a particular UUID. Frankly, I cannot think of a mechanism for reliably and efficiently dereferencing them that does not depend on One Big Centralized Server. This raises a bunch of questions: who maintains this server? Who pays for it? What are their motivations? Why would we trust them? Are the accountable to ICANN? If so, is that a good or bad thing?
Consider that even Google has never volunteered to keep up-to-date with the coming and going of Web resources on a minute by minute basis. If they had the capacity it might be great if they could be informed of every new URI the instance it was minted. But I do not think that they do have that capacity so they prefer to use a spider and have a repository that is often weeks out of date.
So, in real-world practice, the HTTP URI has the capacity to be both a name and an address, whereas the UUID namespace is not properly constructed to be an address.
We've demonstrated that in code we can treat a string as either a name or an address, but some strings will fail to perform when we try to use them as addresses. But code does not run randomly. It is invoked based upon context. On the Web, we use markup context to decide to use code. For instance, if we put a URI in an XML namespaces xmlns attribute, code should use it as a name.
<mydoc xmlns:h="http://www.prescod.net/foo"
xmlns:u="uuid:6A21402E-9641-11D8-A237-0003939B6294">
<h:bar>BAR!</h:bar>
<u:baz>BAZ!</h:baz>
</mydoc>
You see this kind of code in XML processing all of the time:
ns1 = "http://www.prescod.net/foo"
ns2 = "uuid:6A21402E-9641-11D8-A237-0003939B6294"
def startElementNS(self, name, qname, attributes):
(namespace, localname) = name
if (namespace == ns1 and localname == "bar"):
do_something()
elif (namespace == ns2 and localname == "baz"):
do_something_else()
On the other hand, if an XML vocabulary has something like this:
<a href="http://www.prescod.net/foo">Blah</a>
...then it is typically going to be processed with code that looks like this:
data = urllib.urlopen(attributes["href"])
For instance an obvious application is in a spider or browser.
Even in the case of a spider or a browser, where dereferencing is the whole point, you can benefit from using things labelled as addresses as names.
Consider the following code which downloads each URI once and only once.
href = attributes["href"]
if not info_repository.has_key(href):
info_repository[href] = urllib.urlopen(href).read()
Or consider software that uses a local fallback when dereferencing fails:
href = attributes["href"]
try:
data = urllib.urlopen(href).read()
info_repository[href] = data
except IOError:
data = info_repository.get(href) # look up from cache
if data:
do_something_with(data)
else:
print "We're really out of luck: no network and no cache!", href
Addresses can be viewed as names, just as particles of light can be viewed as waves. But as we saw before, "pure names" cannot be viewed as addresses without deploying a significant resolution infrastructure.
Even in a context where identifiers are primarily names, it can be extremely powerful to have the ability to occasionally use them as addresses.
Consider the case where a program stumbles upon an unknown HTTP URI representing some object type that it needs to deal with. Perhaps the program knows how to deal with files of type application/XAC associated with XML namespace attributes. It might deal with the situation like this:
namespace, localname = name
if info_repository.get(uri):
do_something()
else:
print "I do not know anything about the namespace", uri
print "Would you like me to try and find a XAC file for it?"
if raw_input("yes or no") == "yes":
XAC = try_to_find_reference_to_XAC_in(
urllib.urlopen(uri))
# look for link tags with appropriate "rel" and
# "type" attributes.
else:
print "I could not find a XAC file."
XAC = None
if not XAC:
print "Would you like to browse the site yourself?"
print "You can download a XAC file and install it in"
print "the application directory."
if raw_input("yes or no") == "yes":
print "Starting your web browser!"
webbrowser.open(uri)
As you can see, the URI could have served as information to help the application figure out how to deal with the vocabulary, or it could have served as documentation for a human reader. If you wish, you can think of HTTP URIs as merely UUID URIs that happen to point to their own documentation.
Given that we all agree that documentation is an important part of any well-engineered system, the argument in favour of H should be clear. But the story would not be complete if I did not address some quasi-technical and sociological issues related issues.
Yes, this is a legitimate benefit if UUIDs: but it is independent of UUID URIs. You can embed UUIDs in HTTP URIs just as easily:
http://www.prescod.net/uuid/6A21402E-9641-11D8-A237-0003939B6294
Once you've done this, the URI is dereferencable just as any other URI is. For instance you could ask the "repository" for information about the node that generated the URI. Even if you have no such repository in your system design today, you may want one in the future.
I suppose there may be some application where this is more important than the extrafunctionality you get from HTTP URIs, but I find it hard to imagine it.
It depends: if you were only using the URI as a name, and not an address, then the question is meaningless. It is like asking how to move a resource from one UUID to another.
On the other hand, if you were taking advantage of the fact that the URI is both a name and an address, (unlike the UUID) then you should use an HTTP redirect.
By the way, why do you want to move the resource? From a technical point of view, "prescod.net" and "prescod.com" are equivalent: just strings. But I acknowledge that there is a sociological reality so that's why HTTP has a redirect feature.
First, it won't hurt anybody using the identifier as just a name. They'll just keep doing their string comparisons and lookups happily. So once again this is not a weakness of H in comparison to U, but only compared to the platonic ideal of a combined name and address.
Second, within the domain name system there are a variety of ways for me to get a long-lived URI. One is to own the domain myself and just keep paying for it. If there is some reason that I think it might be embarassing or otherwise improper for my resource to be identified with "prescod.net" in the future, then right off the bat I could register the 6A21402E domain and use that as the basis for my identifiers.
Another strategy is to generate a TinyURL ("http://tinyurl.com/ywz2g") or similar HTTP-based handle which does not embed organizational information. Proper URI dereferencing implementations will automatically dereference these to whatever they happen to be associated with at the time. Today, TinyURL does not seem to allow registered URIs to be updated but this would be a simple addition to their service.
Yes. Just as it is confusing to view light as both particles and waves. But it is powerful to do so. There is a tradeoff there. If the reader strongly values syntactic obviousness over power, and feels that educational articles like this one are inadequate then the reader may legitimately remain enamoured of UUID URIs. I feel otherwise.