NEXT GENERATION AI CENTRIC HPC
During the recent Supercomputing Asia (SCA) 2019 conference, topics related to AI and high-performance computing (HPC) technologies required for Machine/Deep Learning (ML/DL) occupied majority of the programme sheet as well as the vendor booths at the Conference exhibition floor. I will share some that are of interest to our computational researchers.
What can we look forward to in the near future?
More resources at the National Supercomputing Centre (NSCC)
Minister Heng Siew Kiat announced a Government funding of NSCC phase II development in the tune of $200M during the Conference opening. With this development, NUS researchers can expect the availability of not just more HPC resources to run larger simulations, but also the introduction of new technologies that will further speed up research discovery, particularly in the data analytics and AI domains. Today NSCC is already providing some state-of-the-art GPU systems for use in machine/Deep Learning (ML/DL) research.
New HPC capabilities for ML/DL
New developments are expected across hardware and software in the effort to accelerate AI/ML/DL research and applications.
Processor/Accelerator – While GPU is still the most wildly used ML/DL accelerator, other customised systems such as FPGA (Field Programmable Gate Array), neuromorphic and tensor core/processor are also being adopted for ML/DL acceleration. Technology companies are developing not just more powerful accelerator/core/processor to cater for large-scare and demanding machine learning centrally they are also searching for power-efficient solutions for analytics/AI inferencing at the edge devices such as sensors and network devices.
Storage – During a recent ML/DL test on our new Volta GPU system, we found that the GPU couldn’t be fully utilised because data reading over network from a HDD (Hard Disk Drive)-based NAS storage system was not fast enough to catch up with the processing speed of the GPU. At the SCA Conference exhibition floor, vendors were introducing storage systems with high IOPS (Input/Output Operations Per Second) and low latency customised for ML/DL, typically equipped with full SSD (Solid State Drive) or SSD/HDD hybrid. Such customised storage solutions will be even more necessary as GPU or other ML/DL processor becomes more powerful.
Network – Network technology developments that will further accelerate ML/DL were discussed quite extensively at SCA. For example, we know that IP (Internet Protocol) based network data transfer incurs significantly more overhead (higher latency) than a RDMA (Remote Direct Memory Access) transfer. Some vendors are now looking into direct GPU-GPU communication through RDMA to address that bottleneck. In another development, in-network computing is being explored to reduce application time-to-solution and the costs of data movement. In in-network computing, networks can be used to offload computational tasks. Intelligence can also be built into the network to optimise data transfer.
Software – The key advantage GPU has over other ML/DL customised systems is software availability, where CUDA programming libraries, compilers and APIs have been established and adopted over the years. Such advantages have been further extended with NVidia introduction of RAPIDS, which is a suite of open-source libraries for GPU-accelerated analytics, machine learning and data visualisation. We are looking forward to seeing more software tools and applications developed to drive the adoption of other emerging AI centric hardware systems.
HPC-AI Integration – While HPC systems and techniques are being used to accelerate ML/DL, we also see AI is being used to improve the performance of HPC systems and to drive new family of AI-accelerated HPC simulations. For example: for HPC service providers, AI can be used to optimise HPC job scheduling and electrical power usage. For researchers, AI can be used to aid code development and to advance predictive research.
Quantum computing
Quantum computing was one of the key topics at SCA 2019. Even though a practical quantum machine that will outperform a classical computer is expected to be available only in 10 to 15 years, small quantum computers and simulators are already available for research and application development. During the SCA Conference, IBM demonstrated applications in areas such as financial, chemistry, machine learning and optimisation simulations. Quantum machine learning where quantum algorithms/computer are being used as an accelerator for classical machine learning will be keenly researched in the coming years.
Bringing some of the new technologies to you
NUS IT will be installing a high IOPS and low latency storage system to enable high-speed data processing of ML/DL applications on our new Volta GPU systems. The storage system will be connected through the 100Gbps network using the low latency RDMA protocol. The benefit to you as a user will be improved application performance, especially in data intensive ML/DL.
We will also explore VDI and GPU virtualisation to support interactive use of GPU for ML/DL application development, testing and visualisation. With this facility, researchers no longer have to maintain a local ML/DL hardware and software platform. VDI will also provide a more secure computing environment for more sensitive research. After the interactive development work, the actual time-consuming learning can be performed at the Volta GPU systems in batch mode.