» A Preview of Jupyterhub on HPC - NUS Information Technology | NUS IT Services, Solutions & Governance

A PREVIEW OF JUPYTERHUB ON HPC

By Ku Wee Kiat – NUS IT on 27 Aug, 2018

What is Jupyter?

It is an open source web application that allows users to create and share documents that contain code, equations, visualisations and narrative text. It is useful for data exploration, cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning and much more.

What is Jupyterhub?

It is simply a multi-user version of Jupyter. It is designed for companies, classrooms and research labs with user management and authentication via PAM (Pluggable Authentication Modules), OAuth or other directory services like Active Directory.

The following is a brief introduction on how to access and use Jupyterhub. Exact steps might differ when Jupyterhub is finally deployed. This article will also provide a preview of Jupyterhub on HPC and something interesting users can do with it which is visualising neural networks.

Accessing Jupyterhub

Users will be able to sign in to Jupyterhub using their NUSNET credentials there is no need for a separate set of credentials.

After signing in, users will be presented with the home screen. Simply click “Start My Server” to proceed to the next step, which is to select your job profile.

Selecting a Job Profile

Example Job Profile Options

More job profiles might be available in the future as our resources expand. For example, GPU (Graphics Processing Unit) job profiles for data analytics/AI/deep learning.

Select “serial” for running non-parallel code, and “parallel” with the required computing resource for parallel code.

Launching Your Job

Click after selecting your job profile to launch your Jupyter notebook session.

In the backend, Jupyterhub will submit a PBS (Public Broadcasting Service) job script to launch a Jupyter notebook session for you. Similar to submitting PBS job scripts in the command line/terminal, your Jupyter notebook session job may have to wait in queue.

The Jupyter Notebook

When the server starts up, users will be redirected to a standard Jupyter notebook environment with their home directory displayed. From this screen, you can create new Jupyter python notebooks and more.

Parallel 8 queue showing 8 cpu core count

Working with Jupyter Notebooks

Plotting with Matplotlib in Jupyter Notebook (https://matplotlib.org/gallery/lines_bars_and_markers/simple_plot.html)

Besides plotting the usual plotting of graphs, Jupyter notebooks are useful for visualising neural networks.

Visualising a Simple Convolutional Neural Network (CNN)

Visualising a neural network can give us useful information about what it is learning. While designing a neural network for the task of image recognition, it is sometimes useful to be able to interpret and understand the model’s predictions.

Libraries used: Tensorflow, Keras, Numpy, matplotlib

We train a simple convolutional neural network (CNN) with the following architecture on the MNIST (Modified National Institute of Standards and Technology database) dataset. The network was trained for 10 epochs with a batch size of 256 with the Adam optimiser (Advanced Data Management).

• Input, size 28x28x1

• Convolution, 32 filters, kernel size 3×3, ReLU activation

• Convolution, 64 filters, kernel size 3×3, ReLU activation

• Max Pool, pool size 2×2

• Dropout 0.25

• Convolution, 128 filters, kernel_size 3×3, ReLU activation

• Max Pool, pool size 2×2

• Dropout 0.25

• Fully connected, 128 neurons, ReLU activation

• Dropout 0.5

• Fully connected, 10 neurons, Softmax activation

Looking at the first 36 images in MNIST (Modified National Institute of Standards and Technology database) with Jupyter

Validation accuracy of 0.994 after 10 epochs

With the trained model, we can now visualise the weights of the layers, layer activations given an input, saliency maps and much more.

Some activations in the third convolution layer

Saliency Map of the different inputs (Input, coloured map, grayscale map, coloured map, smoothed grayscale map)

The goal of the saliency map is to identify the pixels of an image which contribute the most towards a particular class prediction. Basically, the most important pixels are those that makes the network determine that it belongs to a certain class. Saliency maps are useful for segmenting images, it can be used to localise the area of interest in the image.

As shown above, Jupyter notebooks can be used for more than just plotting graphs, it can be used for examining and understanding neural networks and many more analytical applications.

Users can look forward to more visual and interactive software for Data Analytics, Machine Learning and Deep Learning in the future.

Reference

Project Jupyter: http://jupyter.org/
More on Visualising Convolutional Neural Networks: Understanding CNN http://cs231n.github.io/understanding-cnn/
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps https://arxiv.org/abs/1312.6034v2

Please contact the Data Engineering Technology team at DataEngineering@nus.edu.sg if you have any query on the above developments.