Jupytext: The hack-free way to use Jupyter notebooks with git

As mentioned previously, I’m a big fan of Jupyter Notebook/Lab. That said, there is one thing that’s always irked me about working with Jupyter notebooks: Jupyter’s .ipynb format doesn’t play well with version control software like git. Jupyter notebooks use a JSON-based format that allows for storing both the input and the output of each cell.

Unfortunately since JSON is not a line-based format git’s diff algorithms don’t work properly on JSON. JSON isn’t particularly readable for humans either, which means you need to use Jupyter Notebook itself to view your notebook files instead of just opening them in any text editor the way you would a .py file. Until recently this meant that if you wanted to use common development best practices such as pull-request based workflows and code review with Jupyter Notebook you were pretty much out of luck. Worse, you can’t use the goodies that your favorite IDE/text-editor provide on a Jupyter notebook, so your notebook code ends up being more difficult to refactor than a normal Python script would be.

Luckily the Python community has come to the rescue with a brilliant Jupyter notebook extension called Jupytext. Jupytext solves all of these issues by letting you link a Jupyter notebook to a second file in another format (.py and markdown being the most commonly used). Jupytext then takes care of bi-directionally syncing the two formats, so once you’ve linked your notebook to a .py file whenever you click Save in Jupyter it will save an updated copy of your notebook to the linked .py file as well. The best part is that this sync is bi-directional: if you edit the .py file it’ll also update your Jupyter notebook with the corresponding changes (although you’ll have to refresh your browser window to see the changes).

This is pretty great – not only do you get a .py file that you can keep in version control the same way you would any python project, you’re also able to use fancy refactoring features from your text-editor or IDE to edit the .py file, do any complicated refactoring you see fit, and immediately (well, technically after hitting refresh in your browser tab) see the changes in your Jupyter notebook.

Setup

Interested? Setup is fairly straightforward, and once set up it works in both Jupyter Notebook and Jupyter Lab.

The first step is to install the python module that powers most of this magic. From a terminal do:

pip install jupytext --upgrade

If you fire up Jupyter Notebook (by running jupyter notebook from a terminal window), you’ll see that there’s now a new Jupytext item in Jupyter Notebook’s File menu:

NOTE: If you’re using Jupyter Lab instead of Jupyter Notebook this menu won’t be visible, but you can still use Jupytext via the CLI command described below. See the Jupyter Lab section at the end of the post for more info.

If you select Pair Notebook with Markdown from the menu and save your notebook, you’ll notice that Jupytext created a new .md file in your folder with the same name as your notebook. When you open it, you’ll see that the contents of all your cells have been written into the markdown file. That markdown file is now being kept in sync with your Jupyter notebook, if you make changes to it you’ll see the changes appear in your Jupyter Notebook window (after a refresh) as well.

For reasons that are beyond me, the developers of Jupytext chose not to include Python in the list of pairings that are provided in the GUI, so while Jupytext is quite capable of saving a .py file as well as a .ipynb file, we need to configure it to do so manually.

You can do this from the Jupyter notebook GUI by going to the Edit menu and choosing Edit Notebook Metadata. You’ll be presented with an editable blob of JSON text. If you’ve paired the notebook with another format as mentioned in the previous paragraph you should see a piece of JSON that looks like this:

"jupytext": {
  "formats": "ipynb,md"
}

Change the md in the format section into py so so that it looks like "formats": "ipynb,py" and click save, and Jupytext will start saving a copy of your notebook into a .py file with the same name as your notebook.

If you don’t like editing JSON manually or prefer the command line, you can also do the same thing via a CLI command:

jupytext --set-formats ipynb,py --sync <your-notebook-file>.ipynb 

The above tells Jupytext to write every change made to your notebook to both <your-notebook-file>.py and <your-notebook-file>.ipynb. As long as Jupyter notebook is running, edits to either the .ipynb file or the .py file will update both files.

Jupyter Lab Addendum

As mentioned above, if you’re a Jupyter Lab user things work a little bit differently. The Jupytext extension doesn’t change the menus in Jupyter Lab, so rather than going to Edit Notebook Metadata, you’ll need to use the jupytext --set-formats ipynb,py --sync <notebook>.ipynb method to create the link between your notebook and it’s text representation. Additionally, the first time you start Jupyter Lab after installing the Jupytext extension you may get a message saying Build Recommended jupyterlab-jupytext needs to be included in the build, make sure you choose build here so that Jupyter Lab will correctly load the Jupytext extension.