COMPUTE RESOURCES IN THE CLOUD
Introduction
This year marks a major milestone in the development of high performance computing in NUS IT, with the introduction of computing resources in the cloud, after two years of exploration and pilot testing. By the time this article is published, Research Computing in NUS IT will be provisioning computing resources to HPC subscribers in addition to the existing HPC compute clusters that are on-premise. One major benefit of the cloud is its scalability. The virtually limitless cloud resources will enable NUS IT to scale its compute resources in the cloud to meet the demands of the researchers. This will translate to shorter waiting time for jobs sent to our job queues.
Extending the HPC Compute Cluster to the Cloud
NUS IT has established a secure VPN connectivity to the cloud, via physical VPN gateway devices linking NUS network with the external cloud service provider. On top of this, a fast connection is achieved via our own Singapore Open Exchangei. In the cloud environment, a secure, private network segment was carved out for NUS, making it accessible only via NUS intranet. This effectively creates an extension of our campus intranet to the cloud securely and privately. With this in place, Research Computing is then able to extend our compute clusters to the cloud, by spinning up new compute instances as and when required.
Running HPC workloads in the Cloud
1.What kinds of computational resources are available?
For a start, a cloud compute cluster consisting of the latest Intel Platinum 8000 series (Skylake-SP) processors will be made available to HPC users. These compute nodes will be configured with the same applications, users’ home directories and workspace as in our on-premise compute clusters. As such, users will not see any difference in terms of the files that they have access to and the applications that they are already running in HPC.
2.How do I run my jobs?
The cloud compute cluster will be managed by our PBS Pro job scheduler. A new queue will soon be announced, and HPC users can then modify their PBS Pro job scripts to submit their computational jobs to this new queue. All job files should be transferred to the /hpctmp workspace so that all jobs will then be able to read and write to the /hcptmp workspace during job runs. With this arrangement, HPC users can then access their results files from any of our login nodes, without having to go to the cloud.
Conclusion
With the introduction of the new cloud compute cluster, we hope to be able to improve our job turnaround time. Our computational resources will no longer be restricted to the fixed hardware on-premise, and it will be simpler for us to adjust our computational offerings in the cloud, for example when there is a need to add more resources to meet the ever-increasing demands from NUS’ HPC community.
[i] Singapore Open Exchange, https://www.sox.net.sg/
[ii] NUS Single TouchPoint for IT Service Requests: nTouch