W3C Logo

Interface Files Proposal


Python Interface Declaration Language

0.5

Python Extension Proposal January-2000

This version

xxxxxxx

Editors

Paul Prescod, ISOGEN/DataChannel ()

Copyright  ©  1998 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.

Status of this document

This document is a NOTE made available for discussion and experimental implementation. This does not mean that the described features are slated for inclusion into Python. They may be included once the normal processes of discussion and refinement have been conducted.

Abstract

A PyDL file declares the interface for a Python module to allow static and runtime type checking of client modules. An interface is a description of the composition of attributes for an object (whether a module object, class instance or built-in object). An object may declare support for an interface if all of its attributes meet the requirements described by the interface. PyDL files associate interfaces with attribute names. When an interface is associated with a name all objects bound to the name must export a compatible interface.

This document describes the behavior of a class of software modules called static interface interpreters and static interface checkers. Interface interpreters are run as part of the regular Python module interpetation process. They read PyDL files and make the interface objects available to the Python compiler. Interface checkers read PyDL files and Python code to verify conformance of the code to the interface.

Table of Contents

1. PyDL Files
2. Grammar
3. Interfaces
4. Behavior
5. Interface compatibility
6. Built-in Interfaces
7. Interface expression language
8. Declarations in a PyDL file
8.1 Imports
8.2 Basic attribute interface declarations
8.3 Callable object interface declarations
8.4 Interface declarations
8.5 Class Declarations
8.6 Typedefs
9. New Module Syntax
9.1 "typesafe"
9.2 as
9.3 Interface objects
9.4 New Attributes
9.5 Experimental syntax
10. Summary of Major Runtime Implications
11. Future Directions
11.1 Inferencing/Deduction
11.2 Const-ness/Readonly-ness
11.3 Idea: The Undefined Object

1. PyDL Files

A PyDL file can be either created by a programmer or auto-generated. The syntax and semantics of the two files types are identical. An auto-generated file is created by scanning a Python module for inline declarations.

Interfaces are the central concept in PyDL files.

Interfaces are Python objects like anything else but they are created by the interface interpreter before Python interpretation begins. They are made available to the static interface checker for purposes of static type checking.

In addition to defining interfaces, it is possible to associate interfaces with names of attributes in the namespace of modules and other objects. Values bound to the name in the object's namespace must always conform to the declared interface. Furthermore, by the time the module or object's initializer is executed each name must have an associated value.

It is allowed but not necessary for the static interface checker to reject a module if it can prove that these rules will be violated. It is also acceptable to check at runtime. There also exists syntax (see ) to require these checks at compile time.

2. Grammar

In the very short term, implementors are encouraged to use any grammar that accepts very example in this document. Contributions of proposals for the exact grammar are solicited.

3. Interfaces

Interface objects are created through interface definitions, class declarations and interface expressions. The API may eventually provide facilities for creating interfaces at runtime but they are neither available to nor relevant to the interface interpreter or static type checker.

Interface definitions are similar to Python class definitions. They use the keyword "interface" instead of the keyword "class".

Interfaces are either complete or incomplete. An incomplete interface takes interface parameters and a complete interface does not. It is not possible to create Python objects that conform to an incomplete interface. Incomplete interfaces are a reuse mechanism analogous to functions in Python. An example of an incomplete interface would be "Sequence". It is incomplete because we need to parameterize it with the interface of the contents of the sequence.

Is there a better term than "incomplete"?

In an interface expression the programmer can provide parameters to generate a new interface.

Typedefs allow us to give names to complete or incomplete interfaces described by interface expressions. Typedefs are an interface expression re-use mechanism and are described in .

Interfaces have a concept of compatibility which is used to check whether one interface object is substitutable for another.

Sometimes there exists code that is only compatible with a single implementation of an interface. This is the case when the object's actual bit-pattern is more important than its interface. Examples include integers, window handles, C pointers and so forth. For this reason, every class is considered also an interface. Only instances of the class and its subclasses (if any) conform to the interface. These are called implementation specific interfaces.

4. Behavior

For our purposes, we will presume that every Python execution environment has some form of compilation phase. This is true of all existing Python environments.

The Python compiler invokes the static interface interpreter and, optionally, the interface checker on a Python file and its associated PyDL file. Typically a PyDL file is associated with a Python file through placement in the same path with the same base name and a .pi (for hand-written files) or .gpi (auto-generated) extension. If both are available, the module's interface is created by combining the declarations in the .pi and .gpi files as if they were in a single file.

Is this the right way to handle the relationship between inline and out-of-line declarations?

"Non-standard" importer modules may find PyDL files using other mechanisms such as through a look-up in an relational database, just as they find modules themselves using non-standard mechanisms.

The interface interpreter reads the PyDL file and builds the relevant interface objects. If the PyDL file refers to other modules then the interface interpreter can read the PyDL files associated with those other modules after generating them if necessary.

It is acceptable to use date-stamps, CRCs and other heuristics to demonstrate that a generated PyDL file is not likely to be inconsistent with its module.

The Python compiler may invoke the interface checker after the interface interpreter has built interface objects and before it interprets the Python module. The interface checker must verify that each function that is declared to be type safe is type safe and report any that are not.

Once program execution begins, the interface objects are available to the runtime code through a special namespace called the interface namespace. There is one such namespace per module with a PyDL file. It is accessible from the module's namespace via the name __interfaces__. This namespace is interposed in the name search order between the module's namespace and the built-in namespace.

Are there performance issues with a new namespace? Is there a requirement for a new namespace?

5. Interface compatibility

Interfaces can be compatible with other interfaces. If X is compatible with Y then objects that support the interface X may be bound to names associated with Y.

If X is compatible with Y and Y is compatible with X then we say that they are equivalent.

Would it make sense to use Python's comparison operators to allow programmatic access to the compatibility infrastructure?

6. Built-in Interfaces

The following diagram represents the base/derived-interface graph for the built-in Python objects. Interfaces that have multiple base interfaces are marked with a parenthesized occurrance number. So "Class" and "IncompleteInterface" are both derived from "Callable" and also from "Interface".

Implementation specific interfaces are marked with asterisks. Over time this list may get shorter as the Python implementation is generalized to work mostly by interfaces.


Any
    Number
        Integral
            Int *
            Long *
        Float *
        Complex *
    Sequence
        String *
        Record
    Mapping
    Module *
    Callable
        Class (2) *
        Function *
        Method *
            UnboundMethod *
            BoundMethod *
        IncompleteInterface (2)
    Interface
        Class (2)
        CompleteInterface
            ComplexInterface
            SimpleInterface
                PrimitiveInterface
        IncompleteInterface (2)
    None
    File

The details of each interface remain to be worked out.

Who will do this? Volunteers?

7. Interface expression language

Interface expressions are used to declare that attributes must conform to certain interfaces. An interface expression is:

  1. a reference to an interface (including classes) by name

    The name can either be simple or it may be of the form "module.interfacename" where "interfacename" is a name in one of two PyDL files for the named module.

    The expression evaluates to the referenced interface object.

  2. a union of two or more interfaces:

    
integer or float 
    integer or float or complex
    

    The expression evaluates to an interface object I such that a value V supports I iff it conforms to any interface in this list.

    A union interface X is compatible with a union interface Y if each element in X's list is compatible with some element in Y's.

  3. a parameterization of an interface:

    Array( Int, 50 )
    Array( length=50, elements=Int )
    

    Note that the arguments can be either interface expressions or simple Python expressions. A "simple" Python expression is an expression that does not involve a function call or variable reference.

    The expression evaluates to a complete instantiation of the referenced incomplete interface.

    A parameterized interface X(Q,R,S,...) is compatible with a parameterized interface Y(Q1,R1,S1,...) if X is compatible with Y and Q, R, S, ... are each compatible with Q1, R1, S1.

  4. a syntactic shortcut:

    
[Foo] => Sequence( Foo ) # sequence of Foo's
    {String:Int} => Mapping( String, Int ) # Mapping from String's to Int's
    (A,B,C) => Record( A, B, C ) # 3-element sequence of interface A, followed
                                 # by B followed by C
    

    The expression evaluates to the same thing as the expanded versions.

    Compatibility is identical to the situation for the expanded versions.

  5. a callable interface:

    def( Arg1 as Type1, Arg2 as Type2 ) -> ReturnType

    The argument names may be elided:

    def( Int, String ) -> None

    Note: this is provided for compatibiity with libraries and tools that may not support named arguments. Python programmers are strongly encouraged to use argument names as they are good documentation and are useful for development environments and other reflective tools.

    Should it be possible to elide the return value and does this mean None or Any?

    It is possible to declare variable length argument lists. They must always be declared as sequences but the element interface may vary.

    
def( Arg1 as String, * as [Int] ) -> Int 
                # callable taking a String, and some un-named Int
                # arguments
    

    Finally, it is possible to declare keyword argument lists. They must always be declared as mappings from string to some interface.

    
def( Arg1 as Int , ** as {String: Int}) - > Int
    

    The ReturnType must be an interface expression. Note that at this point in time, every Python callable returns something, even if it is None.

    The return value can be named, merely as documentation:

    
def( Arg1 as Int , ** as {String: Int}) - > ReturnCode as Int
    
    Is this helpful or unnecessary complication?

    The expression evaluates to a callable interface that takes the described arguments and returns the described value.

    A callable interface X is compatible with a callable interface Y if each argument in Y is compatible with the equivalent argument in X and X's return code is compatible with Y's.

    Note: In other words, X must accept anything that Y accepts as input and must not return anything that Y could not return.

What syntax to use for optional arguments?

8. Declarations in a PyDL file

8.1 Imports

An import statement in an interface file loads another interface file. The import statement works just like Python's except that it loads the PyDL file found with the referenced module, not the module itself. (of course we will make this definition more formal in the future)

8.2 Basic attribute interface declarations


decl myint as Int                   # basic 
decl intarr as Array( Int, 50 )     # parameterized
decl intarr2 as Array( size = 40, elements = Int ) # using keyword syntax

Attribute declarations are not parameterizable. Furthermore, they must resolve to complete interfaces.

Note: That means that this is allowed:


interface (_X,_Y) spam( A, B ):
    decl someInstanceAttr as _X
    decl someOtherAttr as Array( _X, 50 )

    ....

These are NOT allowed:


decl someModuleAttr(_X) as Array( _X, 50 )

interface (_Y) spam( A, B ):
    decl someInstanceMember(_Y) as Array( _Y, 50 ) 

Because that would allow you to create a "spam" without getting around to saying what _X is for that spam's someInstanceMember. That would disable static type checking.

8.3 Callable object interface declarations

Functions are the most common sort of callable object but object instances can also be callable. Callables may be runtime parameterized and/or interface parameterized. For instance, there might be a method "add" that takes two objects with the same interface and returns an object with that interface.


decl DoSomething( _X ) as def( a as _X, b as _X )-> _X

_X is the interface parameter. By convention these start with underscores. a and b are the runtime parameters.

Note: it is usually possible to coerce a parameterized function into a fully polymorphic function where the arguments can vary from each other quite widely despite being declared to have the same parameter type. You can do this by instantiating the function with "Any" as the parametric type.

It is possible to allow _X to vary to some extent but still require it to always be a Number:


decl Add(_X as Number) as def( a as _X, b as _X )-> _X

So this function could take two longs or two floats but not two strings.

Note: as above, you could create a version that would take a float and a long by referring to a common base interface like Number itself.

8.4 Interface declarations

An interface decarlation starts with the keyword "interface", optionally has interface parameters in parentheses and then continues with the interface name and the names of super-interfaces. This interface inherits and must not contradict the signature of the parent interfaces.

The interface body is made up of attribute declarations.


interface (_X,_Y) spam( a, b ):
    decl somemember as _X
    decl someOtherMember as _Y
    decl someClassAttr as [ _X ]

    decl someFunction as def( a as Int, b as Float ) -> String

Note: the interface does not disallow attributes that are not explicitly mentioned. Other attributes may exist and may be used by non-typesafe code. Type safe code may refer only to the attributes declared in the relevant interface.

An interface may be derived from (or based upon) another interface called the base interface using Python inheritance syntax. Objects directly supporting a derived interface are said to indirectly support the base interface and its base interfaces all of the way up to the most basic interface, Any.

Each attribute in the derived interface that has the same name as an attribute in a base interface must be compatible with the attribute in the base interface.

When the word support is used in this document without qualification, it means support either directly or indirectly.

Implementation specific interfaces are implicitly derived from the ordinary interfaces that the class adheres to.

8.5 Class Declarations

A class is a callable object that can be subclassed.

The syntax for a class definition is identical to that for a function with the keyword "def" replaced by "class". This describes the initalization method for the class. The return value must be an interface expression.

Note: The signature of the created object can be described in one or more separate (referenced) interface declaration.


interface (_ElementType) TreeNodeInterface:
    decl right as TreeNode(_ElementType) or None
    decl left as TreeNode(_ElementType ) or None
    decl element as _ElementType

# initialization method takes three parameters:
decl TreeNode(_X) as class(  el as _X, 
            Right as TreeNode( _X ) or None,
            Left as TreeNode( _X ) or None ) -> TreeNodeInterface(_X)

When the initialization completes, every attribute in the declared interfaces should have a value.

8.6 Typedefs

Typedefs allow interfaces to be renamed and for parameterized variations of interfaces to be given names.


typedef PositiveInt as BoundedInt( 0, maxint )
typedef NegativeInt as BoundedInt( max=-1, min=minint )
typedef NullableInt as Int or None
typedef Dictionary(_Y) as {String:_Y}

9. New Module Syntax

In the interface-enhanced version of Python, declarations will be allowed in Python code and will have the same meanings. They will be extracted to a generated PyDL file and evaluated there (along with hand-written declarations in the PyDL file). In the meantime, there is a backwards compatible syntax explained later.

9.1 "typesafe"

In addition to decl, typedecl and interface, the keyword "typesafe" can be used to indicate that a function uses types in such a way that each operation can be checked at compile time and demonstrated not to call any function or operation with the wrong types.

The keyword can precede any function definition.


typesafe def foo( a, b ):
    ...

The typesafe keyword can also be used before a class definition. That means that every method in the class is declared to be type safe.

The typesafe keyword can be used with the "module" modifier before the first function or class definitions in a module to state that all of the functions and classes in the module are type safe:


import spam
import rabbit
import orphanage

typesafe module 

An interface checker's job is to ensure that functions that claim to be typesafe actually are. It must report and refuse to compile modules that misuse the keyword. It may not refuse to compile modules that do not. The interface checker may optionally warn the programmer about other suspect constructs in Python code.

Note: typesafe is the only change to class definition or module definition syntax.

9.2 as

The "as" operator takes an expression and an interface expression and verifies at runtime that the expression evaluates to an object that conforms to the interface described by the expression.

It returns the expression's value if it succeeds and raises TypeAssertionError (a subtype of AssertionError) otherwise.


>> foostr = foo as [String] # verifies that foo is a List of Strings and
                          # re-assigns it.

>>> j = getData()
>>> j as Int
>>> j=j+1

The "as" operator has the lowest precedence of the binary operators.

9.3 Interface objects

Every interface object (remember, interfaces are just Python objects!) has the following method :


__conforms__ : def (obj: Any ) -> boolean

This method can be used at runtime to determine whether an object conforms to the interface. It would check the signature for sure but might also check the actual values of particular attributes.

There is also a global function with this signature:


class_conforms : def ( obj as Class, Obj as Interface ) -> boolean

This function can be used either at compile time (e.g. by an implementation of an interface checker) or runtime to check that a class will generate objects that have the right signature to conform to the interface.

The interface definition for incomplete interfaces is:


interface IncompleteInterface:
    decl __call__ as def( *args: [Any] ) -> Interface

9.4 New Attributes

All objects should have a new attribute __attributes__. It must list the union of lists of attribute names exported by the object's supported interfaces.

All objects should also have an attribute __interfaces__ that returns the set of interface objects supported by the object. For instance and function objects, this attribute should be automatically added by the runtime based on declarations in the PyDL file.

(the rest of the interface reflection API will be worked out later)

9.5 Experimental syntax

There is a backwards compatible syntax for embedding declarations in a Python 1.5x file:


"decl","myint as Integer"

"typedef","PositiveInteger as BoundedInt( 0, maxint )"

"typesafe"
def ...( ... ): ...

"typesafe module"

There will be a tool that extracts these declarations from a Python file to generate a .gpi (for "generated PyDL") file. These files are used alongside hand-crafted PyDL files. The "effective interface" of the file is evaluated by combining the declarations from the same file as if they were concatenated together (more or less...exact details to follow). The two files must not contradict each other, just as declarations within a single file must not contradict each other. This means that names that are declared twice must evaluate to equivalent types.

Over time the .gpi generator will get more intelligent and may deduce type information based on code outside of explicit declarations (for instance function and class definitions, assignment statements and so forth).

The "as" keyword is replaced in the backwards-compatible syntax with a function (**** more info ****)

10. Summary of Major Runtime Implications

All of the named interfaces defined in a PyDL file are available in the "__interfaces__" dictionary that is searched between the module dictionary and the built-in dictionary.

The runtime should not allow an assignment or function call to violate the declarations in the PyDL file. In an "optimized speed mode" those checks would be disabled. In non-optimized mode, these assignments would generate an IncompatibleAssignmentError.

The runtime should not allow a read from an unassigned attribute. It should raise NotAssignedError if it detects this at runtime instead of at compile time.

Are the existing Name and Attribute errors sufficient?

Several new object interfaces and functions are needed.

11. Future Directions

11.1 Inferencing/Deduction

At some point in the future, PyDL files will likely be generated from source code using a combination of declarations within Python code and some sorts of interface deduction and inferencing based on various kinds of assignment.

11.2 Const-ness/Readonly-ness

We need to be able to say that some attributes cannot be re-bound and that some attributes and parameters are immutable.

11.3 Idea: The Undefined Object

A new Undefined object could be used as the value of unassigned attributes and the return value of functions that do not return a value. It may not be bound to a name.


a = Undefined   # raises UndefinedValueError
a = b           # raises UndefinedValueError if b has not been assigned

Undefined can be thought of as a subtype of NameError. Undefined is needed because it is now possible to declare names at compile time but never get around to assigning to them. In ordinary Python this is not possible.

The only useful thing you can do with Undefined is check whether an object "is" Undefined:


if a is Undefined:
    doSomethingWithA(a)
else:
    doSomethingElse()

This is equivalent to:


try:
    doSomethingWithA( a )
except NameError:
    doSomethingElse

It is debatable whether we still need NameError for anything other than backwards compatibility. We could say that any referenced variable is automatically initialized to "Undefined". Undefined is sufficiently restrictive that this will not lead to buggy programs.

Undefined can also be used to correct a long-term unsafe issue with functions. Now, functions that do not explicitly return a value return Undefined instead of None. That means that this is no longer possible


a = list.sort()

With Undefined, it will blow up because it is not possible to assign the Undefined value. Before Undefined, the code did not blow up but it also did not do the "right thing." It assigned None to "a" which was seldom what was intended.