PEP: XXX Title: Pyrex as standard extension technology Version: Last-Modified: Author: Paul Prescod Status: Active Type: Standard Track Content-Type: text/plain Created: 12-Jan-2003 Python-Version: 2.5 Post-History: Abstract This PEP suggests that Pyrex[1] be migrated into the standard library to become the default way for building all but the most performance intensive extensions to Python. Motivation Every time it is necessary to extend the standard library, Python's maintainers must decide whether to do the implementation in Python or C. There are a variety of attractions to doing it in Python. They range from the selfish (programming in Python is easier) to the altruistic (the standard library is an excellent source of sample code to would-be Python gurus). Despite this, there are two obvious reasons that there are so many modules written in C rather than Python: C code runs faster and C code is necessary to access APIs like "win32", "posix" and "berkeley sockets." Pyrex offers a chance to get close to the "best of both worlds". Pyrex code can live on a spectrum from ultimate readability (recompile Python code as Pyrex) to ultimate performance (transliterate C code). In the middle there are often points where you can get 90% of the performance of C and 90% of the readability of pure Python. Of course mileage will vary greatly[2]. Some modules will trivially translate to a blazingly fast and readable Pyrex module. Others will be so hard to optimize in Pyrex that C remains better. Most Pyrex modules do not have any explicit calls into the CPython API. The more mature Pyrex gets, the less reason there is to call directly into CPython. This means that Pyrex is becoming a pretty "tight" (as opposed to leaky) abstraction layer. This offers some interesting possibilities. For instance, what if the Pyrex abstraction were "ported" to the Java Native Interface, the .NET unmanaged runtime or Parrot? All of these runtimes are implemented in C and have ways of giving access to C code. The Pyrex compiler could generate C code specific to each runtime. Then modules that make sense across implementations (e.g. datetime, xreadlines, array, bz2) could merely be "recompiled" for an entirely different runtime. This will make the bootstrapping of new Python runtimes (e.g. PyPy, Parrot) much easier. Pyrex is a great bridge to the PyPy project in that it already embodies some of the key ideas of that project and in that PyPy could easily interpret Pyrex code and even use something like "ctypes" to handle calls out to C functions. The Pyrex abstraction layer could also ease the pain when the CPython implementation itself changes. Many of the standard library modules have "inner loops" that could be implemented in Pyrex without hurting the readability and maintainability of the module. If it becomes easier to extend the Python implementation, we would expect the Python user community to become (even!) more interested in helping out. Implementation Strategy Pyrex should be brought up to date with Python 2.3 (it lacks the // division operator and perhaps other new features). Pyrex also needs some tweaks to allow it to generate more optimal code for Python protocols (like sequences) and types (like lists). These are do-able inside of the Pyrex type declaration framework. Pyrex should become part of the standard Python source distribution. Contributors should be encouraged to add new Pyrex modules rather than pure C modules. When it makes sense to rewrite existing C modules for other reasons, they can be rewritten as Pyrex modules (except where it would hurt their performance too much). Code Maintenance Issues We should always consider carefully before adding a large mass of code to the Python source base. Code must be developed and maintained. But part of the idea with Pyrex is to reduce the amount of effort required to maintain individual modules by focussing the effort on the maintaince of the Pyrex compiler (in the same way that it is better to optimize the Python interpreter than to optimize a single Python application). Over a year or so, Pyrex should reduce the maintenance burden imposed by the ever-growing list of C modules that come with Python. Performance Issues Another concern is that Pyrex is far from an optimizing compiler. Right now its performance is only so-so. Even in an ideal world, it is hard to believe that one will soon be able to program at the Python level of abstraction and get hand-tuned C performance. It is important to understand that Pyrex allows each developer to choose the appropriate tradeoffs of readability/portability/flexibility/ performance. You can choose to write purely idiomatic, type-generic Python code or strong type-specific (and type declared!) C-ish Pyrex code. You can even reach into the guts of the CPython interpreter to wring out extra performance (at a cost to portability). For the purposes of this discussion I will describe this code as "non-portable" even though it works on any platform that CPython does, because it is not even theoretically portable to an implementation of Python with a different interal API. Empirical tests[2] suggest that relatively readable and "portable" Pyrex code can compete with "readable" and type-generic C code like that in the Python 2.4 bisectmodule.c. Tweaked and optimized (and non-portable!) Pyrex code can compete with tweaked and optimized C code. Pyrex could quickly become a net-benefit to Python's performance if modules that are pure-Python today could have their inner-loops compiled to C. The better the Pyrex optimizer gets (as it surely would with the attention of python-dev) the less often it will be necessary to choose between performance and abstraction. References [1] Pyrex - a Language for Writing Python Extension Modules http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/ [2] Pyrex optimization experiments http://www.prescod.net/python/pyrexopt/optimization.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: