Using the FileUpload widget in Jupyter Lab/Notebook

I’m a big fan of Jupyter Notebook, and an even bigger fan of the more recent Jupyter Lab (which just reached 1.0!), and for a recent project I wanted to use it as the “frontend” for some python number crunching code. The audience isn’t necessarily technical, so the challenge was to try and make Jupyter act a bit less like a programmer’s tool and a bit more like a traditional webapp.

I’d heard mention of Jupyter widgets before, but hadn’t gotten much of a chance to work with them. The workflow I wanted to create for end-users goes like this:

  1. Upload an Excel sheet full of input data
  2. Optionally tweak a few parameters
  3. Generate a new Excel spreadsheet with the output data and allow the user to download it.

With fairly minimal effort I was able to get pretty close to that workflow, but for whatever reason finding good docs was a challenge on this project. Part of the issue is that Jupyter Lab is pretty new, so much of the documentation that exists refers to the older Jupyter Notebook, which behaves pretty differently when it comes to installing extensions. So here’s my write-up of how to do this in Jupyter Lab in 2019.

I’d originally planned to use ipyupload, but after a little searching I learned that it’s maintainer had ported his upload code into ipywidgets, which is pretty much the de-facto set of Jupyter widgets, so I decided to just use ipywidgets. Installing Jupyter Lab extensions is a lot easier than installing extensions for Jupyter Notebook was, although in ipyupload’s case it took a while to figure out because their documentation hasn’t yet been updated to reflect Jupyter Lab. This is how you install it in Jupyter Lab:

jupyter labextension install @jupyter-widgets/jupyterlab-manager

Reading user-uploaded files

Once you have it installed, using the FileUpload() widget is pretty straightforward, if under-documented. Essentially, you import it, and then instantiate an instance of the FileUpload widget. When the cell is displayed the user sees a nice upload button, which will open a file selector when clicked. This is what my upload cell looked like:

from ipywidgets import FileUpload
upload = FileUpload()
upload

Once the user has picked a file with the file selector, the contents of the file will be available in a bytes object, under the .data property of the FileUpload object.

In my case, I just wanted to write the data from the user’s upload into a new file on the Jupyter notebook’s filesystem, which I did with this snippet:

with open("input.xlsx", "w+b") as i:
    i.write(upload.data[0])

NOTE: be careful, the FileUpload() widget can allow the user to upload more than one file, which the above code ignores. Additionally, the above code will throw an exception if run before the user picks a file. For production code I’d recommend more error checking than I’m doing above.

Reading and writing excel files

For reading and writing Excel files, the simplest solution is to use the excellent pandas library. You need to first install a couple other libraries (one for reading excel files, one for writing) before pandas will play nicely with excel data:

pip install openpyxl==2.6.2 xlrd==1.2.0

Once that’s complete, you just need to use pandas' .read_excel() function to read the spreadsheet into a pandas dataframe. In my case, my input spreadsheet had two tabs that I was interested in. You can have pandas read these by passing a list to the sheet_name kwarg:

import pandas as pd
sheets = pd.read_excel(
    inp_file,
    sheet_name=[
        "Sheet1",
        "Sheet2",
    ],
)

You can then access each sheet by doing, e.g.:

sheets['Sheet1']

Pandas for the most part does the right thing here, sheets['Sheet1'] returns a pandas DataFrame object, with the column names set to the values of the first row of your excel sheet.

Once you’ve completed your work with the data it’s time to write it out to a new Excel spreadsheet. Pandas dataframes come with a .to_excel() method, which works great for simple use cases, you just need to pass it a filename and it will take care of the rest:

df.to_excel("output_excel_file.xlsx")

In my case I needed to write an excel file that had multiple tabs, which is slightly more complicated. You need to first create an ExcelWriter object, and then use it as a context manager, passing each dataframe you want to write:

with pd.ExcelWriter(outfile) as writer:
    dataframe1.to_excel(writer, sheet_name="Tab 1")
    dataframe2.to_excel(writer, sheet_name="Tab 2")