Reference cycles involving lists, tuples, instances, classes, dictionaries, and functions are found. Instances with __del__ methods are handled in a sane way. It is easy to add GC support to new types. GC enabled Python is binary compatible with regular Python.
Generational collection works (currently three generations). The overhead measured by pybench is about 4 percent. Virtually all extension modules should work unchanged (I had to modify new and cPickle in the standard distribution). A new module called gc is available for tuning the collector and setting debugging options.
The collector should be portable across all platforms. The patched version of Python passes all regression tests and runs Grail, Idle and Sketch without problems.
The portable garbage collection has been included in Python since version 2.0. It is enabled by default. Be happy.>>> l = [] >>> l.append(l) >>> del lThe reference count for the list created is now one. However, since it cannot not be reached from inside Python and cannot possibly be used again, it should be considered garbage. In the current version of Python, this list will never be freed.
Creating reference cycles is usually not good programming practice and can almost always be avoided. However, sometimes it is difficult to avoid creating reference cycles and other times the programmer does not even realize it is happening. For long running programs such as servers this is especially troublesome. People do not want their servers to run out of memory because reference counting failed to free unreachable objects. For large programs it is difficult to find how reference cycles are being created.
Since we are still using reference counting, the garbage collector only has to find reference cycles. The reference counting will handle freeing other types of garbage. First we observe that reference cycles can only be created by container objects. These are objects which can hold references to other objects. In Python lists, dictionaries, instances, classes, and tuples are all examples of container objects. Integers and strings are not containers. With this observation we realize that non-container objects can be ignored for the purposes of garbage collection. This is a useful optimization because things like integers and strings should be fast and small.
Our idea now is to keep track of all container objects. There are several ways that this can be done but one of the best is using doubly linked lists with the link fields inside the objects structure. This allows objects to be quickly inserted and removed from the set as well as not requiring extra memory allocations. When a container is created it is inserted into this set and when deleted it is removed.
Now that we have access to all the container objects, how to we find reference cycles? First we add another field to container objects in addition to the two link pointers. We will call this field gc_refs. Here are the steps to find reference cycles:
With garbage collection, calling finalizers becomes a tricker problem, especially in the face of reference cycles. If two objects in a reference cycle both have finalizers, what should be done? Which should be called first? After calling the first finalizer the object cannot be freed as the second finalizer still may access it.
Since there is no good solution to this problem, cycles that are referenced from objects with finalizers are not freed. Instead these objects are added to a global list of uncollectable garbage. It should almost always be possible for the program to be rewritten so that this does not occur. As a last resort, the program can access the global list and free cycles in a way that makes sense for application.
The collector currently keeps track of three generations of objects. By adjusting the collection parameters the time spent collecting can be made as small as desired. For some applications it may make sense to disable automatic collection and explicitly call the collection routine. However, with the default collection parameters, the time spent collecting does not seem to be significant when running pybench. Obviously applications that allocate container objects extensively will cause more time to be spent collecting.
The current patch adds a new configure option to enable the collector. Python with the collector is binary compatible with standard Python. If the option is disabled there is no effect on the workings of the Python interpreter.
$ cd Python-1.5.2 $ patch -p1 < ../gc-malloc-cleanup.diff $ patch -p1 < ../gc-boehm.diff $ autoconf $ ./configure --with-gcThe patch assumes you have libgc.a installed somewhere so that linking with -lgc works (/usr/local/lib should be okay). If you don't have the library, download and install it before compiling.
Currently the patch is only tested on Linux. It will probably work on other Unix machines as well. On my Linux machine, the GC version of Python passes all regression tests.
Thanks to Narihiro Nakamura, there is a Japanese translation of this page available.
Updated: Wed, 06 Dec 2000 10:36:38 -0800
Neil Schemenauer
<nas@arctrix.com>