E pur si muove

Importing modules in C extension modules

Sunday, July 05, 2009

It seems that if you need another module in a function of your extension module, the way modules in the standard library seem to solve this is like this:

static PyObject *
    PyObject *foo;

    foo = PyImport_ImportModuleNoBlock("foo");
    if (foo == NULL)
        return NULL;
    /* do stuff with foo */
    return something;

This means that you have to import the module each time you enter the function (yes, it's looked up in the modules dict by PyImport_ImportModuleNoBlock() but that function is only avaliable since 2.6, before you have to use PyImport_ImportModule()).

Personally I like storing the module in a static variable so that it only needs to be imported the first time:

static PyObject *
    static PyObject *foo = NULL;

    if (foo == NULL) {
        foo = PyImport_ImportModuleNoBlock("foo");
        if (foo == NULL)
            return NULL;
    /* do stuff with foo */
    return something;

Note here that the Py_DECREF() is gone. This function will effectively leak a reference to the module object. But is this really bad? How often do module objects get deleted in production code? My guess is that they normally stay loaded until the application exits.


Ludvig Ericson said...

Yeah, that's how I do it in pylibmc as well. Though I don't use static references, since it'll just be a matter of a dict lookup and an INCREF+DECREF once the module is imported.

Unknown said...

By keeping a reference to the module in a static variable and using that for subsequent calls, you have guaranteed that your C extension module will not work in Python sub interpreters. Hope therefore you never want to use it for web application development under mod_wsgi, mod_python or anything else that uses sub interpreters. If you do, you will need to configure the system if possible to always run your application in the main interpreter, something which may not always be possible.

Unknown said...

So keeping a reference to any object is harmful? Or just to module objects? I will have to read up on sub interpreters, thanks for the heads up.

Unknown said...

Technically, no Python object should be shared between sub interpreter instances.

BTW, if you are using simplified API for GIL state management, the issue is moot anyway as use of that API would preclude it working in sub interpreters anyway.

Unknown said...

So any global or static variable in C is out, this seems to match what the Py_NewInterpreter() seems to say. That means no singletons, no Py_BEGIN_ALLOW_THREADS and no caching of objects retrieved from other modules.

This seems a rather big pain, in PSI we load exceptions from other modules and store them in global symbols, we load _C_API objects and do the same, we have a singleton, do module caching and use Py_BEGIN_ALLOW_THREADS.

The module caching is probably fine to get rid of, the singleton would be a shame, but workable. The exceptions and _C_API objects would be a pain since being able to use those symbols like any other python symbol (PyExc_*, PyInt_New, etc) is very nice. It does make me wonder actually how the sub-interpreters cope with exceptions.

As for Py_BEGIN_ALLOW_THREADS, this seems to declare a new local _save variable every time you use it. Is that really harmful? To me that seems fine (but then I don't know this issue very well).

Daniel Lescohier said...

What if you use global static variables, and the only place that you use those variables in a non-const way is in your initmodule function? According to what I read in the Py_NewInterpreter docs, it looks like it might be safe for sub-interpreters:

Extension modules are shared between (sub-)interpreters as follows: the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module's dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension's init function is not called. Note that this is different from what happens when an extension is imported after the interpreter has been completely re-initialized by calling Py_Finalize() and Py_Initialize(); in that case, the extension's initmodule function is called again.

An example; at top of .c file:

static PyObject *normalize; /* unicodedata normalize function */
static PyObject *category; /* unicodedata category function */

/* constants */

static PyObject *NFC;

Inside the initmodule function:

m = PyImport_ImportModule("unicodedata");
if (m == NULL) return;

d = PyModule_GetDict(m); /* Always succeeds */

s = PyString_FromString("ucd_3_2_0");
if (s == NULL) return;

o = PyDict_GetItem(d, s);
if (o == NULL) return;

normalize = PyObject_GetAttrString(o, "normalize");
if (normalize == NULL) return;
category = PyObject_GetAttrString(o, "category");
if (category == NULL) return;

/* initialize constants */
NFC = PyString_FromString("NFC");
if (NFC == NULL) return;
SEPS_CTLS = Py_BuildValue("(ssss)", "Zs", "Cc", "Zl", "Zp");
if (SEPS_CTLS == NULL) return;
UNASSIGNED_OR_PRIVATE = Py_BuildValue("(ss)", "Cn", "Co");
SURROGATES = Py_BuildValue("s", "Cs");
if (SURROGATES == NULL) return;

New comments are not allowed.

Subscribe to: Post Comments (Atom)