NEW SERVICES AND RESOURCES FOR DATA CENTRIC RESEARCH IN 2018
We will be introducing more HPC services and resources in 2018 to cater for data-centric research such as Data Analytics, Machine Learning and Bio-imaging. One of the prominent technologies to be refreshed would be the GPU.
New HPC cluster
We will be replacing one CPU cluster with higher core count CPU nodes and GPGPU (General-Purpose GPU) nodes. The higher core count per CPU node will benefit compute intensive and complex simulations such as CFD, Multi-physics and Climate studies. The CPU will also come with enhanced vector capability to further accelerate parallel processing.
The GPGPU nodes to be introduced will come with thousands of GPU cores. These nodes will be used to support both the traditional HPC applications such as molecular dynamics simulation, and data-intensive applications in Data Analytics/Machine Learning/Deep Learning researches. More GPU-enabled applications, libraries and tools will be introduced, and these can also be installed upon request from researchers.
Storage, network and software tools to facilitate data-centric research computing
In a data-centric computing environment, storage and network are important components that allow large amounts of data to be stored and accessed conveniently and moved quickly to where it is needed. The following related resources and services that will be further enhanced in this coming year include:
- The 100G high-speed research network that enables fast data transfer between Research Institutes/Centres (RI/RC) and the National Supercomputing Centre (NSCC) will be ready for connection by May 2018.
- The central on-demand storage service that will be made available over the high-speed research network for fast data backup and retrieval.
- More research data that will be made available for downloading or direct analysis on the Research Data Repository and Analytics system. Our Data Engineering Technology team will also work with researchers to perform data extraction, transformation and loading (ETL) on their specific research data.
- More data analytics, image processing and machine learning tools and libraries that will be introduced in the Research Data Repository and Analytics system to enable direct processing of data in the repository. In general, for large-scale data processing, it is more effective to bring the application to the data then the other way round.
- The teamwill also explore the cloud storage services for more cost-effective data archiving.
More options and faster provisioning for ad-hoc demand
We will deploy cloud resources to complement the in-house resources. For example, if you need to test your GPU application or to kick-start your analytics project while waiting for our new HPC cluster, we can make available such resources quickly at AWS or Azure.
We are also working closely with the National Supercomputing Centre (NSCC) to support research computation that is too large to run on our in-house systems. Recently NSCC has acquired some high-end GPU systems (DGX1) that are customized for large-scale Deep Learning. We can arrange for your access if you have such a need.
Greater Bioinformatics/Bio-imaging support
The consulting services provided by the Data Engineering Technology (DET) team established last year include Bioinformatics computing support. To enhance Bioinformatics support, the team will develop a one-stop Bioinformatics computing platform on the central HPC system. Common pipelines will be developed, which eventually can be customised for individual needs.
The team will also explore image-processing technologies and develop the support capability with anticipation of increasing demand in various areas especially in Bio-medical research.
User consultation and training
The main objective of using HPC resources for large-scale data analytics and deep learning is to accelerate computation through parallel computing. The DET team will focus on helping researchers in optimising and parallelising their Python, R and Matlab applications during user consultation and training. The team will also conduct training on how to make effective use of parallel computing capability in the Research Data Repository and Analytics system.
We observed that most users can benefit from parallel speedup in GPU without having to do CUDA programming. They can run off-the-shelf CUDA enabled applications or they can just call CUDA-enabled packages or libraries from their favourite programming language. Our priority will be to make those software available and advising users on how to incorporate them in their program.
With the emergence of Julia as an upcoming programming language for high-performance numerical analysis and computational science, the team has also started developing the necessary skills to support it.
Please contact the Data Engineering Technology team at DataEngineering@nus.edu.sg if you have any query on the above developments.