Using the FileUpload widget in Jupyter Lab/Notebook
I’m a big fan of Jupyter Notebook, and an even bigger fan of the more recent Jupyter Lab (which just reached 1.0!), and for a recent project I wanted to use it as the “frontend” for some python number crunching code. The audience isn’t necessarily technical, so the challenge was to try and make Jupyter act a bit less like a programmer’s tool and a bit more like a traditional webapp.
I’d heard mention of Jupyter widgets before, but hadn’t gotten much of a chance to work with them. The workflow I wanted to create for end-users goes like this:
- Upload an Excel sheet full of input data
- Optionally tweak a few parameters
- Generate a new Excel spreadsheet with the output data and allow the user to download it.
With fairly minimal effort I was able to get pretty close to that workflow, but for whatever reason finding good docs was a challenge on this project. Part of the issue is that Jupyter Lab is pretty new, so much of the documentation that exists refers to the older Jupyter Notebook, which behaves pretty differently when it comes to installing extensions. So here’s my write-up of how to do this in Jupyter Lab in 2019.
I’d originally planned to use ipyupload, but after a little
searching I learned that it’s maintainer had ported his upload code into
ipywidgets
, which is pretty much the de-facto set of Jupyter
widgets, so I decided to just use ipywidgets. Installing Jupyter
Lab extensions is a lot easier than installing extensions for Jupyter Notebook
was, although in ipyupload’s case it took a while to figure out
because their documentation hasn’t yet been updated to reflect Jupyter Lab.
This is how you install it in Jupyter Lab:
jupyter labextension install @jupyter-widgets/jupyterlab-manager
Reading user-uploaded files
Once you have it installed, using the FileUpload()
widget is pretty
straightforward, if under-documented. Essentially, you import it, and then
instantiate an instance of the FileUpload
widget. When the cell is displayed
the user sees a nice upload button, which will open a file selector when
clicked. This is what my upload cell looked like:
from ipywidgets import FileUpload
upload = FileUpload()
upload
Once the user has picked a file with the file selector, the contents of the
file will be available in a bytes
object, under the .data
property of the
FileUpload object.
In my case, I just wanted to write the data from the user’s upload into a new file on the Jupyter notebook’s filesystem, which I did with this snippet:
with open("input.xlsx", "w+b") as i:
i.write(upload.data[0])
NOTE: be careful, the
FileUpload()
widget can allow the user to upload more than one file, which the above code ignores. Additionally, the above code will throw an exception if run before the user picks a file. For production code I’d recommend more error checking than I’m doing above.
Reading and writing excel files
For reading and writing Excel files, the simplest solution is to use the
excellent pandas
library. You need to first install a couple other
libraries (one for reading excel files, one for writing) before pandas will
play nicely with excel data:
pip install openpyxl==2.6.2 xlrd==1.2.0
Once that’s complete, you just need to use pandas' .read_excel()
function to
read the spreadsheet into a pandas dataframe. In my case, my input spreadsheet
had two tabs that I was interested in. You can have pandas read these by
passing a list to the sheet_name
kwarg:
import pandas as pd
sheets = pd.read_excel(
inp_file,
sheet_name=[
"Sheet1",
"Sheet2",
],
)
You can then access each sheet by doing, e.g.:
sheets['Sheet1']
Pandas for the most part does the right thing here, sheets['Sheet1']
returns
a pandas DataFrame
object, with the column names set to the values of the
first row of your excel sheet.
Once you’ve completed your work with the data it’s time to write it out to a
new Excel spreadsheet. Pandas dataframes come with a .to_excel()
method,
which works great for simple use cases, you just need to pass it a filename and
it will take care of the rest:
df.to_excel("output_excel_file.xlsx")
In my case I needed to write an excel file that had multiple tabs, which is
slightly more complicated. You need to first create an ExcelWriter
object,
and then use it as a context manager, passing each dataframe you want to write:
with pd.ExcelWriter(outfile) as writer:
dataframe1.to_excel(writer, sheet_name="Tab 1")
dataframe2.to_excel(writer, sheet_name="Tab 2")