HIGH PERFORMANCE COMPUTING IN THE CLOUD
Introduction
There is a relatively new class of players in the high performance computing arena, in the likes of Amazon, Microsoft and other similar cloud solution providers (CSPs). These cloud solution providers have been building up their data centre resources into gigantic networks of compute servers, not only within a single data centre, but across multiple data centres across the globe. With such a large amount of compute and storage resources, these CSPs are able to offer compute and storage services, both large and small, to anyone. Need to run a web service? A database server? An HPC compute cluster? Simply buy the appropriate instances from the CSPs, spin them up in the cloud, and they are ready to serve your purposes. There will be no more long procurement processes, waiting time for the hardware to be delivered, worry about the space, power and cooling infrastructure that you need to set up on-premise in this new pay-as-you-use model. The cloud servers and services are up and ready within minutes.
The scale of these resources are beyond what a traditional HPC centre can provide on-premise. Imagine thousands of servers within a single data centre, which are usually duplicated within the same region for redundancy. Duplicate these implementations again across several regions across the globe, and this will give you an idea of the massive scale of the resources that the CSPs have built up. Almost everyone is now providing a service on the Internet using the cloud (be it private or public) in the back-end to run their services and store their data: Facebook, Google, WhatsApp, to name a few.
A Definition of the Cloud
The US National Institute of Standards and Technology has a popular definition for cloud:
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
The five essential characteristics are:
- On-Demand Self-Service
- Broad Network Access
- Resource Pooling
- Rapid Elasticity
- Measured Service
The three service models are:
- Cloud Infrastructure as a Service (IaaS): customer manages the infrastructure, system hardware, operating system and software.
- Cloud Platform as a Service (PaaS): customer manages the system hardware, operating system and software.
- Cloud Software as a Service (SaaS): customer manages the software.
The four deployment models are:
- Private Cloud: the cloud infrastructure is operated solely for an organisation, either on-premise or off-premise, by a third party or the organisation itself. Many large corporations operate their own private clouds.
- Public Cloud: The cloud infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud servicese.g. Amazon Web Services, Microsoft Azure, Google Cloud, and many more.
- Community Cloud: The cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on-premise or off-premise. The National Supercomputing Centre Singapore (NSCC) is an example of a Community Cloud (see: nscc.sg).
- Hybrid Cloud: The cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds).
Running HPC Jobs in the Cloud
Regardless of the service or deployment models, in essence, the servers in the cloud are no different from servers on-premise. The underlying hardware are basically the same, and similarly connected. What makes the cloud different is the virtualisation layer that glues all these individual servers together to give the cloud customers a seemingly infinite pool of virtual machines. Think of this as VMware on steroids. Of course, security and privacy is a main concern for public cloud providers, and no two customers can breach the “sandbox” that they are in, to access systems and services that belong to other cloud customers.
With this understanding of the cloud services, it is therefore a relatively simple task to move one’s computational jobs to the cloud. Assuming that an instance can be acquired, that gives you the tools and libraries to run your computations, i.e. compilers, debuggers, runtime libraries, you can simply move your codes to the cloud instance, compile them and run them in the cloud instance. It is even possible to have multi-processors in the cloud instances to run jobs in parallel. See, for example, the AWS Marketplace or the Azure Marketplace for an idea of the vast variety of instances that one can purchase from Amazon.
AWS, for example, even has a special section for HPC.
Advantages and Disadvantages of Running HPC in the Cloud
Advantages
The obvious advantage of using the cloud is clearly seen in the NIST five essential characteristics. The cloud deployments can be a resource that is on-demand, rapidly available, and easily scalable to meet dynamically changing demands. It is therefore in very high demand in today’s Internet service environment where customer demands change rapidly. There is also economies of scale for the cloud service providers. With various types of customer demands, this makes it profitable for them to provision their cloud services to make use of idle resources that would otherwise be unused if this were a private computational facility.
Disadvantages
The cloud does have its disadvantages. Many who have explored using the cloud agree that the costs may not be as cheap as it seems. Not owning and managing the hardware on-premise may take out the related operational and capital costs but service charges by cloud providers will include some of their operational and capital costs plus profit. The saving grace is that the cloud services are on-demand, and these costs are spread out more evenly across various customers.