Optimizing Ipython with Cython
I’ve been working on a project lately that does a lot of calculations based off of input data that’s being read in from Excel spreadsheets. For this kind of thing Jupyter Notebook/Lab and pandas are my tools of choice, but for this particular project I needed more speed than usual. I’ve known of Cython for quite some time, but never actually had the chance to play with it.
Turns out pandas' own documentation has some pretty good tips on how to improve pandas' performance, including using cython and also using a library I hadn’t seen before called numba which can JIT compile functions.
Using cython through Jupyter Notebook turns out to be almost laughably easy. You have to install cython first:
pip install cython
Once that’s ready, then somewhere in your notebook you have to load cython by running a cell with this inside it:
%load_ext Cython
Then you can just add %%cython
to the top of any cell and Jupyter will
automatically try to compile and use the cython version when the cell is
run.
Of course nothing is ever truly that easy, and things get weird. Cython doesn’t
always seem to share the same module namespace as the rest of your notebook
code, things worked better for me when I did any import
calls needed from the
cython cell rather than at the top of my notebook the way I usually would. In
general, treating each cython cell more like a self-contained program just
worked better – cython didn’t like global variables either. It worked best for
self-contained functions.
But when it works, wow it’s cool. Seamless, and gave me an immediate speed boost.
Cython is also a bit fiddlier than python itself is. I was originally running my notebook in a conda environment, but wasn’t able to get cython cells to compile properly there. After experimenting a bit with cython outside of Jupyter notebook I was able to determine that the problem was with my entire python environment itself, not with Jupyter.
It seems to have been some sort of gcc
version conflict. For some reason
cython was trying to use my system’s gcc
to compile it’s C files instead of
using the one provided by conda.This led to some sort of binary incompatibility
with the libraries it tried to link in, and gcc
would just fail. I’m not a C
programmer, so rather than disappearing down that debugging rabbit-hole, I just
worked around the issue by creating a virtualenv using my system python instead
of conda’s python, and everything worked as expected after that.