This document is a PRIVATE DRAFT, not for external publication.
Web sites such as blogs have an easy way to publish both data and metadata (often RSS or some other RDF variant). They do not have an easy way to inform each other that relevant information has been published. In other words, the standard mechanisms are all "pull-based" and have no notion of "pushing" information. This means that web sites can only build links to other web pages that are "found" (e.g.. by spiders or human readers) which means that some web pages relevant to a topic may never be found without human intervention. This specification describes an automated way for pushing notifications of information availability.
This specification is written according to the terminology used in the HTTP, URI, RDF and other web specifications. In particular, rather than speaking about "web sites" or "web logs" it speaks about resources. Resources are any URI-addressable entity (which includes web sites, web logs, web log categories and individual permalinks on web blogs). Although the specification is intended to be general, the first and most important requirement is that it be appropriate for web logs. A secondary goal is that it be appropriate for semantic web technologies and web services based on Web architectural principles ("REST web services").
The quick summary of TrackBack for Web log implementors follows:
Your web log software some simple markup in a Web log post announcing that TrackBack notifications for the post can be posted to CGI or Servlet-based URI B (the notification resource).
The end-user creates some resource (usually another Web log post) and POSTs to the notification resource at URI B.
Your web log software adds a link to the other web log into the HTML representation of the post.
A resource that accepts TrackBack notifications is known simply as a "Notification Resource." This will typically be implemented as some kind of CGI, Servlet or other Web server extension. The role of TrackBack is to let the Notification Resource learn about other resources that are related. In the Web Log case, these Related Resources are described in RSS.
Each Notification Resource is interested in accepting notifications about some particular topic like "the Weather in Vancouver" or "responses to my weblog entry on such and such a date about such and such a topic. This topic SHOULD also have a URI (a "permalink" in Web log terminology) and is known as the Topic Resource. Examples of Topic resources include weblog entries, weblog categories or open directory project categories.
Finally, there may be one or more Description Resources which make the link between the Notification and About resources so that the relationship can be discovered by humans or software programs. It is possible to combine any two or even all three resources into a single resource. For example, a CGI-based resource with a particular URI could have some markup (e.g. HTML) discussing a topic (e.g. the weather), some other markup (e.g. RDF) declaring that it supports TrackBack notifications and could accept POSTs of RSS data on the same URI.
It is possible to have multiple Notification Resources for a single Topic Resource but then the list of notifications received by each Notification resource is likely to be incomplete. This may be acceptable in some cases.
There may be as many Description Resources as is convenient. After all, they just assert the same relationship over and over again.
It is not possible to have multiple Topic Resources for a single notification resource because this would complicate client implementation by requiring them to declare what Topic they are notifying about.
A Notification Resource may be notified of a Related resource merely by using an HTTP POST. The syntax of the POST is not constrained but it is strongly suggested that the body be RSS or another RDF variant to encourage interoperability. Another highly interoperable encoding is "application/x-www-form-urlencoded".
User agents SHOULD include the Content-Type and other recommended HTTP headers. In general, all features of HTTP may be used. For example, authentication can be used to prevent "spam" notifications (see Section 5: Preventing Spam for more information).
The result of the notification is currently unconstrained. The HTTP response code should be appropriate. The most common ones are:
200 OK : The request sent by the client was successful.
201 Created : The request was successful and a new resource was created that represents the TrackBack. Use the "Location" header to return the URI of the new resource. A notifying Weblog might want to link forward to the TrackBack URI.
3XX: Appropriate redirects for if a resource moves.
4XX: Various client errors, especially 401 for Authorization failures and 415 for Unsupported Media Type.
5XX: Various server errors, especially 500 for broken implementation codes.
Notification resources SHOULD respond to an HTTP GET request with markup that asserts that the URI is a Notification resource. See Section 3: Notification Resource Discovery for declaration syntax and Section 4: Cross-site Notifications for security-related motivation for this practice. In addition, the HTTP GET response could include information about what notifications have already been delivered. Finally, the GET response could also describe relevance: for example it could be the Weblog post itself.
A resource can be asserted to be a TrackBack notification resource in any convenient and interoperable way. The simplest way would be to use prose text: "POST to http://... if you have a Web site about Brad Pitt." But this is not machine readable. One step closer to machine readability would be an HTML form with a "POST" method and notification resource as "Action". Web logs are encouraged to provide these forms. But this syntax is not machine discoverable because it looks like any other HTML.
Implementors SHOULD use a discoverable syntax that can be unambiguously recognized as an assertion of the Notification/Topic relationship. The discoverable syntax MAY be associated with other syntaxes. Here is an example of a machine-discoverable syntax embedded in HTML:
<a name="#foo"><h1>About Foo</h1>
<p>I think Foo rocks. If you agree, please trackback.</p>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
<rdf:Description
rdf:about="#foo"
dc:title="I love FOO!"
trackback:ping="http://www.foo.com/tb.cgi/5" />
</rdf:RDF>
<form method="POST" action="http://www.foo.com/tb.cgi/5">
Title of your comment: <input name="title" type="text"/>
URI of your comment: <input name="url" type="text"/>
</form>
The trackback:ping specification is defined by TrackBack Module for RSS 1.0/2.0. The rest is standard HTML or RDF.
Note that the Description resource (the RDF) is embedded in the Topic resource (HTML). As described above, the Topic ("rdf:about") URI and Notification URI ("trackback:ping") could be the same because the data (e.g. Weblog post) is delivered through HTTP's GET method but notifications are delivered by POST.
This specification defines two headers to allow an alternate discovery mechanism for resources that cannot embed assertions. If a resource supplies an X-Trackback header in a resonse to an HTTP method (typically GET or HEAD) then the header should include a URI and that URI should be treated as a Notification resource with the current resource as Topic. Conversely, if a resource supplies an X-TrackBackTopic then the URI specified should be treated as a Topic resource with the current resource as Notification.
This specification does not prevent a resource on one host from declaring a notification resource on another. But it does suggest some best practices that will reduce the likelihood of "spoofing" or "denial of service attacks".
If the host of a Description is different than the host of the Notification Resource, it is possible that it is asserting a non-existant relationship between the Topic resource and the Notification Resource. For instance a pernicious web page could claim that your corporate intranet weblog is interested in resources about "Your Favorite Porn Sites".
Whenever a human or software agent has reason not to trust a Description, it should do a GET on the resource. If the resource responds with another appropriate Description (perhaps RDF embedded in HTML or an X-TrackBackTopic header) then that descripion can of course be trusted. If it does not, then extreme caution is warranted and software agents should probably ask the end-user to make the final decision.
A robot could go around asserting that a for-profit web site is relevant to anything and everything merely to drive traffic to the site ("make money fast"). In order to prevent this, a trackback provider might want to embed some kind of challenge question that can only be answered by a human reader. The user agent should present the challenge to the user so that they may supply a result. The user's answer should be supplied as an HTTP Authorization header.
<rdf:Description
rdf:about="#foo"
dc:title="I love FOO!"
trackback:ping="http://www.foo.com/tb.cgi/5"
trackback:challenge="a three-letter word for feline." />
</rdf:RDF>The user agent should ask the user: "Please supply 'a three-letter word for feline' to authorize this TrackBack." The user can respond "cat". The server can send this authorization along.
POST http://www.foo.com/tb.cgi/5 HTTP/1.1 Authorization: Basic Y2F0
Note that "Y2F0" is a Basic authentication encoding of "cat".
Implementation reports are requested relating to this feature. Flexible authentication is known to be difficult in many web server implementations. In particular, web servers that strip out the authorization information will make implementation tricky or impossible.
Weblogs SHOULD embed a single RDF element per Weblog post with at least the "rdf:about", "dc:title" and "tracback:ping" attributes. The syntax of the "rdf:about" attribute should be "#foo" to notify about an HTML-anchored weblog entry and "." to notify about the whole current page. Agents that help users to submit TrackBacks should do so in RSS syntax. They MAY attempt RSS 1.0 but must be able to fall back to RSS 0.9x if they receive an HTTP 415 Unsupported Media Type error message.