Optimizing Ipython with Cython

I’ve been working on a project lately that does a lot of calculations based off of input data that’s being read in from Excel spreadsheets. For this kind of thing Jupyter Notebook/Lab and pandas are my tools of choice, but for this particular project I needed more speed than usual. I’ve known of Cython for quite some time, but never actually had the chance to play with it.

Turns out pandas' own documentation has some pretty good tips on how to improve pandas' performance, including using cython and also using a library I hadn’t seen before called numba which can JIT compile functions.

Using cython through Jupyter Notebook turns out to be almost laughably easy. You have to install cython first:

pip install cython

Once that’s ready, then somewhere in your notebook you have to load cython by running a cell with this inside it:

%load_ext Cython

Then you can just add %%cython to the top of any cell and Jupyter will automatically try to compile and use the cython version when the cell is run.

Of course nothing is ever truly that easy, and things get weird. Cython doesn’t always seem to share the same module namespace as the rest of your notebook code, things worked better for me when I did any import calls needed from the cython cell rather than at the top of my notebook the way I usually would. In general, treating each cython cell more like a self-contained program just worked better – cython didn’t like global variables either. It worked best for self-contained functions.

But when it works, wow it’s cool. Seamless, and gave me an immediate speed boost.

Cython is also a bit fiddlier than python itself is. I was originally running my notebook in a conda environment, but wasn’t able to get cython cells to compile properly there. After experimenting a bit with cython outside of Jupyter notebook I was able to determine that the problem was with my entire python environment itself, not with Jupyter.

It seems to have been some sort of gcc version conflict. For some reason cython was trying to use my system’s gcc to compile it’s C files instead of using the one provided by conda.This led to some sort of binary incompatibility with the libraries it tried to link in, and gcc would just fail. I’m not a C programmer, so rather than disappearing down that debugging rabbit-hole, I just worked around the issue by creating a virtualenv using my system python instead of conda’s python, and everything worked as expected after that.