The Point Of No Return

A few days ago I started developing a small Python extension module in C++. The goal was to expose some native code to Python. During the development I encountered a strange bug in one of our dependencies.

The setup

In one of my projects at work we needed to create a Python module to interface with some native code. I already had some experience with the Python ctypes module, which is a way to directly access C structs and call C functions. But since we wanted to expose some complex data structures, I decided to investigate other options.

Since we were already using parts of Boost, I decided to try the Boost.Python library for our project.

The first example was up and running shortly after. I created a file named hello.cpp:

#include <boost/python.hpp>
#include <string>

namespace p = boost::python;

std::string hello() {
    return "Hello World!";
}

BOOST_PYTHON_MODULE(hello) {
   p::def("hello", hello);
}

I compiled this file as a shared library. For testing I just imported the module in python and called the defined function:

[wanzenbug@work testbed]$ gcc -lboost_python3 -I/usr/include/python3.6m/ -pedantic -Wall -fPIC --shared -o hello.so hello.cpp
[wanzenbug@work testbed]$ python3 -c "import hello; print(hello.hello())"
Hello World! 

And just like that I exposed my first C++ function to python.

Now came the real work: Our C++ code relied heavily on multidimensional arrays, and we wanted them exposed to Python. At first I thought we had to implement the conversion from C++ (we were using Boost.Multi-Array) to Python (namely Numpy) objects by hand. Luckily, Boost came to the rescue again, or so I thought.

The bug

Boost, as of version 1.63 provides a new library for dealing with numpy arrays in python extensions: Boost.Python.Numpy. Reading the tutorial, I edited my example module hello.cpp:

#include <boost/python.hpp>
#include <boost/python/numpy.hpp>

namespace p = boost::python;
namespace np = boost::python::numpy;

np::ndarray hello() {
    p::tuple shape = p::make_tuple(3, 3);
    np::dtype dtype = np::dtype::get_builtin<float>();
    return np::zeros(shape, dtype);
}

BOOST_PYTHON_MODULE(hello) {
   Py_Initialize();
   np::initialize();

   p::def("hello", hello);
}

As before I compiled this to a shared library and ran it in python:

[wanzenbug@work testbed]$ gcc -lboost_python3 -lboost_numpy3 -I/usr/include/python3.6m/ -Wall -pedantic -fPIC --shared -o hello.so hello.cpp
[wanzenbug@work testbed]$ python3 -c "import hello; print(hello.hello())"
RuntimeError: FATAL: module compiled as little endian, but detected different endianness at runtime
ImportError: numpy.core.umath failed to import
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

It worked! Oh, wait... Something does not seem right here.

The hunt

How can numpy report an endianness other then little? I compiled everything on an x86_64 machine, which are definitly little endian.

At this point I thought there may be a problem with my numpy installation. Maybe there was a wrong version installed somewhere (a problem I have faced multiple times with conflicting versions of python packages, one version installed via the OS package manager, one installed by pip, one installed via conda, etc...).

My quick solution: use a clean Docker image as test environment. At this point I have to say I use Fedora as my go-to Linux flavour. So I docker pull fedora:28, installed the required packages, copy my code and try again:

RuntimeError: FATAL: module compiled as little endian, but detected different endianness at runtime
ImportError: numpy.core.umath failed to import

So that didn't help. Next I bugged a colleague who was running Arch, in the hopes that it may be just an outdated version causing the issue:

RuntimeError: FATAL: module compiled as little endian...

Another colleague was running Ubuntu 18.04, so we tried again:

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

That's not what we expected. At this point my first colleague had done some google-fu and found several bug reports with the same error message:

Nobody had any idea what was going on. The Boost maintainers suspected some problem on the packagers side, as Ubuntu was working as intended. The Package maintainers suspected the problem to be upstream, because 2 independent distributions faced the same problem.

In conclusion: I had to find the bug myself.

Locating the bug

First I had to narrow down the lines of code to reproduce the issue. Playing with hello.cpp again:

#include <boost/python.hpp>
#include <boost/python/numpy.hpp>

namespace p = boost::python;
namespace np = boost::python::numpy;

BOOST_PYTHON_MODULE(hello) {
   Py_Initialize();
   np::initialize();
}

Taking away the call to np::initialize() removes the problem (but leaves us unable to use the numpy extension). So somewhere in this call the error presents itself. Quickly searching through the Numpy source code for the error message leads us to the following code, which is generated when Numpy is compiled:

static int
_import_array(void)
{
  int st;

  /* ... */
  
  /*
   * Perform runtime check of endianness and check it matches the one set by
   * the headers (npy_endian.h) as a safeguard
   */
  st = PyArray_GetEndianness();
  if (st == NPY_CPU_UNKNOWN_ENDIAN) {
      PyErr_Format(PyExc_RuntimeError, "FATAL: module compiled as unknown endian");
      return -1;
  }
#if NPY_BYTE_ORDER == NPY_BIG_ENDIAN
  if (st != NPY_CPU_BIG) {
      PyErr_Format(PyExc_RuntimeError, "FATAL: module compiled as "\
             "big endian, but detected different endianness at runtime");
      return -1;
  }
#elif NPY_BYTE_ORDER == NPY_LITTLE_ENDIAN
  if (st != NPY_CPU_LITTLE) {
      PyErr_Format(PyExc_RuntimeError, "FATAL: module compiled as "\
             "little endian, but detected different endianness at runtime");
      return -1;
  }
#endif

  return 0;
}

Now I thought I found the location of the bug. Something went wrong in the call to PyArray_GetEndianness() or maybe NPY_CPU_LITTLE was set to a wrong value. So I started up gdb, set a breakpoint for the line the exception is created and let my module run.

[wanzenbug@work testbed]$ gcc -g -lboost_python3 -lboost_numpy3 -I/usr/include/python3.6m/ -Wall -pedantic -fPIC --shared -o hello.so hello.cpp
[wanzenbug@work testbed]$ gdb
(gdb) target exec python3
(gdb) break /usr/lib64/python3.6/site-packages/numpy/core/include/numpy/__multiarray_api.h:1532
Make breakpoint pending on future shared library load? (y or [n]) y
(gdb) run
>>> import hello
Thread 1 "python3" hit Breakpoint 1, _import_array () at /usr/lib64/python3.6/site-packages/numpy/core/include/numpy/__multiarray_api.h:1532
1532	      PyErr_Format(PyExc_RuntimeError, "FATAL: module compiled as "\
(gdb) print st
$1 = 1

I just had to check the definition of NPY_CPU_LITTLE. After a little search using grep I found it in include/numpy/npy_common.h

/* enums for detected endianness */
enum {
        NPY_CPU_UNKNOWN_ENDIAN,
        NPY_CPU_LITTLE,
        NPY_CPU_BIG
};

So NPY_CPU_LITTLE is in fact... 1. Wait, what? How can 1 != 1?

Nothing is true, everything is permitted

At this point I started to question my sanity. Luckily, another Github User was also investigating the same problem, and reported his findings. His observations about the boost::python::numpy::initialize() function caught my eye. For me this screamed of Undefined Behaviour (UB).

So I quickly copied his minimal example:

#include <boost/python.hpp>
#include <numpy/arrayobject.h>
#include <numpy/ufuncobject.h>

static void * wrap_import_array() { import_array(); }

void hello() { PySys_WriteStdout("hello\n"); }

BOOST_PYTHON_MODULE(hello) {
  wrap_import_array();
  import_ufunc();

  boost::python::def("hello", hello);
}

And after a quick compilation:

[wanzenbug@work testbed]$ gcc -g -lboost_python3 -lboost_numpy3 -I/usr/include/python3.6m/ -Wall -pedantic -fPIC --shared -o hello.so hello.cpp
In file included from /usr/include/numpy/ndarraytypes.h:1816,
                 from /usr/include/numpy/ndarrayobject.h:18,
                 from /usr/include/numpy/arrayobject.h:4,
                 from hello.cpp:2:
/usr/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^~~~~~~
hello.cpp: In function ‘void* wrap_import_array()’:
hello.cpp:5:53: warning: control reaches end of non-void function [-Wreturn-type]
 static void * wrap_import_array() { import_array(); }
                                                     ^
[wanzenbug@work testbed]$ python3 -c "import hello; hello.hello()"
hello

Well, that did not turn out as expected. But UB would not be UB if it was so easy to reproduce. Often UB only rears its head when enabling optimizations. So I quickly recompiled with -O2 and voila:

[wanzenbug@work testbed]$ python3 -c "import hello; hello.hello()"
RuntimeError: FATAL: module compiled as little endian, but detected different endianness at runtime
ImportError: numpy.core.umath failed to import
hello

When replacing the definition of wrap_import_array() with

static void * wrap_import_array() { import_array(); return NULL; }

like suggested, the error disappeared again.

Looking at the warning generated by the compiler, I quickly dismissed the first one as a user defined warning from Numpy. The second one was of more interest. The definition of wrap_import_array() is a copy of a internal function used in Boost.Python.Numpy, only used when compiling for Python version 3. Just looking at this function we can already see that something is fishy. Like the compiler says, there is no return in this function (at least at first glance).

The point of no return

At first I assumed a missing return would be a compiler error, but a quick search corrected me. A missing return statement in a function that returns a value is considered UB. This still did not explain why the endianness check failed, but it was a starting point.

Investigating further, I found that import_array() is in fact a macro, which expands to

static void * wrap_import_array() {
    if (_import_array() < 0) {
        PyErr_Print();
        PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); 
        return NUMPY_IMPORT_ARRAY_RETVAL; 
    };
}

So there is, in fact, a return statement, hidden in the macro invocation.

This is were UB comes into play. When some control flow would lead to UB, the compiler is free to do anything it wants with it (even insert code that overwrites your hard disk). It can also assume that control flow that would lead to UB cannot happen. In our case this means _import_array() < 0 is always true.

Looking back at _import_array(), we can see that it is in fact a long list of if (condition) return -1; checks, with return 0; at the end. Since the function is defined in a header file, it is visible for the compiler when compiling wrap_import_array(). This means at the appropriate optimization level the compiler will inline this function.

Putting things together we have observed that _import_array() < 0 is always true and only the last return statement in _import_array() returns 0, all others return -1. Putting 1 + 1 together the compiler can deduce that _import_array() never reaches return 0;. This means that one of the previous failure conditions must always be met. In case the last if (condition) return -1; is reached, the given condition must always be true, because if not, we would have to return 0.

As luck would have it, the last if (condition) check is exactly our endianness check. This explains why we get this strange error message.

If you are wondering why Ubuntu did not suffer from the same problem: This is most likely a consequence of different compiler versions: both Arch and Fedora are compiled using gcc-8 while Ubuntu 18.04 defaults to gcc-7.

The fix

The fix is pretty simple: Just let wrap_import_array() return something! This maybe the easiest fix for the strangest bug I ever encountered. Note that Python 2 did not suffer from this bug because the import_array() macro does not return a value, so that wrap_import_array() can be made just void instead of void*.

I would strongly advise anyone to enable -Werror=return-type when using clang or gcc. There is probably an equivalent flag for msvc. There probably is no case where you ever intentionally forgo a last return statement.