XBind 0.7 Tutorial

1 XBind Features
2 Status
3 XBind Examples
4 Overview
5 In Memory Representation
6 XBind Language Specification
7 Implementation
8 Future Directions

XBind is a language for describing the binding between XML documents and language data types, data structures and objects. It is still under development but the basic model and goals are defined.

XML's native data structure is the infoset (roughly speaking, a DOM). The infoset is not a very convenient data structure for programmers to work with. For instance, consider this element:


<orders><order id="932142">
    <customer>...</customer>
    <item>...</item>
    <item>...</item>
    <item>...</item>
</order><order id="243123">
    <customer>...</customer>
    <item>...</item>
    <item>...</item>
    <item>...</item>
</order>
</orders>

Logically, orders is a mapping. The order of elements probably does not matter (although only the language designer knows for sure), but the ability to look orders up by "id" is very important. The "id" field is the key of the mapping.

Logically, each individual "order" element is a structure with three fields, "id", "customer" and "items". The items field is implicitly around the list of "item" elements. It would be an array.


mapping("932142" =>
	struct(id => 932142, customer=> struct(...),
		items => array( struct(...), struct(...), struct(...)),
	"243123" =>
	struct(id => 932142, customer=> struct(...),
		items => array( struct(...), struct(...), struct(...))		

XBind allows the declaration of the mapping between XML elements and natural in-memory programming language data structures and data types.

XBind is not a schema language: it is not intended to describe constraints on documents although it will do some of that as a side effect. XBind is much better at data binding than are existing schema languages. For instance, XML Schema has no notion of "array" or "mapping". In fact, it has no first class structure binding features at all. Tools that use XML Schemas for data binding guess the best language structures from the schema. The XBind philosophy is that you will often want to control binding separately from validation.

XBind is similar to data binding tools like Castor, but XBind is designed to be completely language independent. The idea is that XBind specifications will typically be created not by the programmers that need data binding but by the inventors of vocabularies or XBind-aware third parties. Using one binding language for all programming languages greatly improves the likelihood that you will find an applicable binding alongside the documentation and schema for the original vocabulary or at a third party site.

1 XBind Features

  1. XBind allows the mapping of XML data to common programming language data structures: arrays, structures, mappings and ordered mappings.
  2. XBind is language independent. (unlike JAXB and Castor) Bindings should be defined once per XML vocabulary, not once per vocabulary per programming language.
  3. XBind is bidirectional: language objects can be serialized as XML and XML can be deserialized into language objects.
  4. XBind allows languages to hook in more specific implementations of types. For instance specific classes rather than generic structures.
  5. XBind should allow streaming operation but this is not implemented yet.
  6. XBind should support both dynamically and statically typed languages, but only Python is implemented so far.

2 Status

A prototype version of XBind in Python is available. Development of the language will take place on the low-volume, no-spam XBind mailing list.

3 XBind Examples

Here are some examples to get a flavour of the language.

Here is an example of some code that you would use to construct an RSS document:


def create_rss_doc():
    # "teach" the module about the RSS vocabulary
    deserializer = xbind.Deserializer("test/rss.xbn", 
                        objecttypes = _objecttype_libraries)

    # create a top-level RSS element/object thingee
    obj = deserializer.createByName("Rss", 
                           {"version": "1.0"})

    # create the channel list
    obj.channels = deserializer.createByName("Channels", [])
    
    # create a channel
    channel = deserializer.createByName( "Channel",
           {"title":"A title", "link":"http://.....", 
            "description": "Description", "language": "English"})
    obj.channels.append(channel)
    return obj

4 Overview

XBind allows mapping between elements and the following data types and structures:

struct

A struct is a type with named fields and values. Fields are always named by strings. Values are of other data types and structures. Structs can be given a more precise type which corresponds to a class in a programming language.

map

A map is a type that maps keys to values. Keys are often strings but are not constrained to be strings. For instance a map of xsd:decimal makes perfect sense.

array

An array is a type that holds values in an ordered list.

ordered mapping

An ordered mapping is essentially a merger of a mapping and an array. It supports indexing either numerically based on order in the XML document or by key. Languages without such a data structure can synthesize it through a class.

simple types

A simple type is a type with no exposed sub-structure. It is treated as a coercion of a simple value to a type like integer, float, string, date, etc. Simple types are grouped into "datatype libraries".

The set of libraries can vary from implementation to implementation but the XML-RPC library is required and the XML Schema library is strongly encouraged. Others (e.g. SQL types, CORBA types) are optional.

resource types

A resource type is a type that is essentially a URI to a resource of some specific type. Resource types come in libraries like simple types. Resource types can be described in WRDL or any other resource description language.

Asis elements

Sometimes the best representation for a big of XML is just XML. In that case you can declare that the data be deserialized "asis" which will usually mean that it will be represented as a DOM node.

Bindings between serialized element types and in-memory object types are done like this:


 
<binding name="DirectoryCategory" match="directoryCategory">
  <struct>
    <field name="encoding"><simple select="specialEncoding"/></field>
    <field name="viewname"><simple select="fullViewableName"/></field>
  </struct>
</binding>

This says that elements of type "directoryCategory" should be turned into structures of type "DirectoryCategory". By convention we use leading upper-case letters for type names. DirectoryCategory objects are structures with two fields, "encoding" and "viewname". The first field is constructed by selecting the "specialEncoding" sub-element of directoryCategory . The second field is constructed by selecting the "fullViewableName" sub-element. This mapping can be reversed so that the in-memory structure becomes XML data. The structure becomes an element again, its fields become sub-elements and so forth. Although this example does not use attributes, attributes they are also available.

XBind handles primitive data types through "datatype libraries". A datatype library is just a named collection of datatypes. XBind implementors choose what datatype libraries to implement. The XMLRPC library is absolutely mandatory. The XML Schema library will also be strongly recommended. Other libraries (like SQL datatypes, Java datatypes etc.) will be supported by particular implementors who consider them important. The vast majority of applications will get by with the two built-in libraries, XMLRPC and XML Schema.

5 In Memory Representation

There are two basic ways to achieve bidirectionality. The first is that the in-memory objects could "just know" how they are supposed to be serialized. So if you read one from a file then it remembers what element created it and serializes itself back to that element. If you create a new object then you tell it what XBind type it is so that it knows how to serialize itself. Let's call this the "explicit strategy". The primary problem with this strategy is that the programmer is working with objects that are slightly different than the native objects they use for other things. How different depends upon the language. In many languages it is easy to subclass the built-in types. In other languages you will have to have wrapper classes that wrap the built-in types.

Here is an example of that strategy:


wsdl = WSDL("rss.wsdl")
documentation = wsdl.create("ChannelDocumentation", 
	title => "Channel docs", 
	description =>"Doc description")
rss = wsdl.create("RSSChannel")(
	title => "Blah",
	description => "Some description",
	docs => documentation)

out.write(rss.serialize())

The second strategy is to use "dumb objects" like standard language data structures and data types.


documentation = {title => "Channel docs", description =>"Doc description"}
rss = {title => "Blah",
    description => "Some description",
    docs => documentation}
out.write( WSDL("rss.wsdl").serialize(rss))

Then the WSDL library must use some kind of pattern matching to serialize them properly. This strategy is difficult to specify cleanly. How do you efficiently differentiate between an array of arrays of arrays of integers and an array of arrays of arrays of floats? The "explicit strategy" does not have this problem.

XBind does not have much support for pattern matching of standard language objects. This seemed like it would turn into a science project. For this reason, it is designed for the explicit strategy. Forcing the programmer to be explicit seems like a better strategy in the long run. Pattern matching can be confusing and harder to debug.

The explicit strategy is fairly obviously the right choice for statically typed languages where XBind specifications will be compiled into modules. I think that this choice is also quite reasonable for dynamically typed languages also. It is not unusual in these languages to create new objects with named types. Consider how you would create an "ODBC connection object" in Perl, Python or PHP. Creating XBind objects is just like that.

In other words, if you want to generate XML in your code, you explicitly create objects for the vocabulary you want to eventually generate. And when your read in XML, you get runtime objects that behave quite a bit like native types but remember what XML element they came from. This is roughly the strategy employed by .NET and Castor. One advantage is that the specific types can provide good error checking. For instance an array could throw an exception if you add the wrong type as a sub-element.

A choice left up to individual implementors is whether to use a dynamic type construction strategy as above, or instead compile the XBind specification to language code in advance. This code could be imported as a standard language-native type. XBind supports either strategy and in fact a mature implementation should probably provide both. At a minimum, implementations for dynamically typed languages should provide the dynamic option and those for statically typed language should provide the static option.

6 XBind Language Specification

Not available yet!

7 Implementation

There is a rough implementation of XBind in Python. It is complete enough to run its three example files: call it version "0.7". I felt this important to make sure that nothing in the example files is unimplementable. But I did not want to wait until it was robust before I shared the ideas behind XBind because ultimately the Python version will be just one implementation. But I do intend to make it robust enough to handle any XBind file you throw at it. I am not yet soliciting other implementations because I need a few weeks to shake the bugs out of the language itself.

8 Future Directions

Add support for "keys" (more or less indexes over XML elements) and "key references" (like foreign keys) as in XSLT and XML Schema.

Add some reasonable support for references. A reference is a way that one part of the tree of objects can refer to another part. There will probably be two kinds of references, those based on keys which can be implemented in a purely streaming implementation, and those based on arbitrary XPointers which will require a DOM.

There are certain conventions around the use of QNames and extension namespaces that could use first-class support. For instance QNames are often a way to do hyperlinks. Also, there are conventions around importing and including that should be considered.

XBind itself probably needs an import/include mechanism.

Perhaps allow rule reuse for similarly structured elements.

Support some statically typed language to demonstrate that it can be done. Type inferencing should be sufficient but if not, type annotations should do it.

Build a tool to compile XBind specifications into highly performant state-machine-based C libraries.


HTML rendition created using stylesheets by Wendell Piez of Mulberry Technologies.

[up]|[home]