A PREVIEW OF JUPYTERHUB ON HPC
What is Jupyter?
It is an open source web application that allows users to create and share documents that contain code, equations, visualisations and narrative text. It is useful for data exploration, cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning and much more.
What is Jupyterhub?
It is simply a multi-user version of Jupyter. It is designed for companies, classrooms and research labs with user management and authentication via PAM (Pluggable Authentication Modules), OAuth or other directory services like Active Directory.
The following is a brief introduction on how to access and use Jupyterhub. Exact steps might differ when Jupyterhub is finally deployed. This article will also provide a preview of Jupyterhub on HPC and something interesting users can do with it which is visualising neural networks.
Accessing Jupyterhub
data:image/s3,"s3://crabby-images/ec6f1/ec6f14ebdc58b562e9b816a96a4d1efbfa08997e" alt="jupyter01 Users will be able to sign in to Jupyterhub using their NUSNET credentials there is no need for a separate set of credentials."
After signing in, users will be presented with the home screen. Simply click “Start My Server” to proceed to the next step, which is to select your job profile.
data:image/s3,"s3://crabby-images/49110/4911019ac81178dee3203b707d7689adb1308f6c" alt="jupyter02 jupyter02"
Selecting a Job Profile
data:image/s3,"s3://crabby-images/c9b3f/c9b3f02954b3ab565642c70c0f60ba0006495294" alt="jupyter03 Example Job Profile Options"
More job profiles might be available in the future as our resources expand. For example, GPU (Graphics Processing Unit) job profiles for data analytics/AI/deep learning.
Select “serial” for running non-parallel code, and “parallel” with the required computing resource for parallel code.
Launching Your Job
data:image/s3,"s3://crabby-images/102f0/102f0caa1b1f6f9d153facc1c4070586a2cd4ece" alt="jupyter04 Click after selecting your job profile to launch your Jupyter notebook session."
In the backend, Jupyterhub will submit a PBS (Public Broadcasting Service) job script to launch a Jupyter notebook session for you. Similar to submitting PBS job scripts in the command line/terminal, your Jupyter notebook session job may have to wait in queue.
data:image/s3,"s3://crabby-images/8cadd/8caddde0fdd36fad8fc046d9218d0ec3317d12ab" alt="jupyter05 Waiting for server to start up"
The Jupyter Notebook
When the server starts up, users will be redirected to a standard Jupyter notebook environment with their home directory displayed. From this screen, you can create new Jupyter python notebooks and more.
data:image/s3,"s3://crabby-images/4c12a/4c12adbb634be7185e84ac55191bbaa97f2709f9" alt="jupyter06-650x300 jupyter06"
data:image/s3,"s3://crabby-images/80525/80525e3bb60e8e49f694de055bd11cfa2d58588c" alt="jupyter07-650x202 Parallel 8 queue showing 8 cpu core count"
Working with Jupyter Notebooks
data:image/s3,"s3://crabby-images/a6348/a6348a59e5504e38f69b152c58fc9909d2c01bda" alt="jupyter08-650x497 Plotting with Matplotlib in Jupyter Notebook (https://matplotlib.org/gallery/lines_bars_and_markers/simple_plot.html)"
Besides plotting the usual plotting of graphs, Jupyter notebooks are useful for visualising neural networks.
Visualising a Simple Convolutional Neural Network (CNN)
Visualising a neural network can give us useful information about what it is learning. While designing a neural network for the task of image recognition, it is sometimes useful to be able to interpret and understand the model’s predictions.
Libraries used: Tensorflow, Keras, Numpy, matplotlib
We train a simple convolutional neural network (CNN) with the following architecture on the MNIST (Modified National Institute of Standards and Technology database) dataset. The network was trained for 10 epochs with a batch size of 256 with the Adam optimiser (Advanced Data Management).
• Input, size 28x28x1
• Convolution, 32 filters, kernel size 3×3, ReLU activation
• Convolution, 64 filters, kernel size 3×3, ReLU activation
• Max Pool, pool size 2×2
• Dropout 0.25
• Convolution, 128 filters, kernel_size 3×3, ReLU activation
• Max Pool, pool size 2×2
• Dropout 0.25
• Fully connected, 128 neurons, ReLU activation
• Dropout 0.5
• Fully connected, 10 neurons, Softmax activation
data:image/s3,"s3://crabby-images/247c2/247c266ad396978bd3c4b872623050c5f57d7157" alt="jupyter09-540x650 Looking at the first 36 images in MNIST (Modified National Institute of Standards and Technology database) with Jupyter"
data:image/s3,"s3://crabby-images/d5aa4/d5aa40737c0ceb587baf519be844c5dbeb4cfcac" alt="jupyter10 Validation accuracy of 0.994 after 10 epochs"
With the trained model, we can now visualise the weights of the layers, layer activations given an input, saliency maps and much more.
data:image/s3,"s3://crabby-images/0da7f/0da7f720a0fe3e072d2f1165ccbd4597c66231a7" alt="jupyter11 Some activations in the third convolution layer"
data:image/s3,"s3://crabby-images/52401/52401ef32fa8e9da56fda38e6a81b8129cdf83fb" alt="jupyter12-500x650 Saliency Map of the different inputs (Input, coloured map, grayscale map, coloured map, smoothed grayscale map)"
The goal of the saliency map is to identify the pixels of an image which contribute the most towards a particular class prediction. Basically, the most important pixels are those that makes the network determine that it belongs to a certain class. Saliency maps are useful for segmenting images, it can be used to localise the area of interest in the image.
As shown above, Jupyter notebooks can be used for more than just plotting graphs, it can be used for examining and understanding neural networks and many more analytical applications.
Users can look forward to more visual and interactive software for Data Analytics, Machine Learning and Deep Learning in the future.
Reference
- Project Jupyter: http://jupyter.org/
- More on Visualising Convolutional Neural Networks: Understanding CNN http://cs231n.github.io/understanding-cnn/
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps https://arxiv.org/abs/1312.6034v2
Please contact the Data Engineering Technology team at DataEngineering@nus.edu.sg if you have any query on the above developments.