Miguel Dias Costa, Research Computing, NUS IT
A common first reaction of new users to shared computing resources such as HPC clusters is to ask, “why can’t I just use yum/apt/etc. to install the software I need?”
Package managers like yum and apt require administrative access and naturally users of a shared resource cannot be given administrative access as they have in their own laptops and desktops, but that’s not the only reason not to use such package managers.
Package managers like yum/apt/etc. install and maintain only one version/variant/build of each software at a time, but different users have different requirements, and there are many different combinatorial ways of building the same scientific software, using different compilers, libraries, etc.
Additionally, package managers like yum/apt/etc. and tools like conda and pip typically install a pre-built package that, in order to work on different architectures, is not optimized for specific ones, which is one of the reasons it is preferable to build scientific software for each HPC system instead of using pre-built packages.
Users can always, of course, build or install software in their own home folders, using different tools, and/or use containers – but that leads to unnecessary work, fragmentation and, very often, inefficient, or broken installations.
For decades, the solution has been for administrators to install multiple versions/variants/builds of the software in a shared folder and then provide the users with environment modules to prepare the environment in such a way that only the desired combination is available for that specific session.
Even though environment modules are old technology, using software that is already installed globally (when the software needed is already installed, of course) is still the easiest and most efficient way of using shared computing resources.
And even though environment modules are technology, there are modern ways of building and installing scientific software and then exposing it as environment modules. The truth is, building scientific software is hard, which is one of the reasons the software available as environment modules usually lags behind what the users want to use, but these modern tools allow us to bridge that gap, to some degree.
For the past 6 months, NUS HPC has been exploring EasyBuild, a software build and installation framework that allows you to manage (scientific) software on High Performance Computing (HPC) systems in an efficient way (https://docs.easybuild.io/en/latest/Introduction.html), and users can opt-in to accessing the software installed this way by simply running:
$ source /app1/ebenv
After that,
$ module avail
will show you the same modules that have always been installed at NUS HPC followed by many more installed using EasyBuild (hundreds of them).
Some of you have already been using this environment after having asked a question or opened a ticket about software (it is always good to ask, how else will we know what you need?), and now we are letting more people know.
There’s a lot to learn about this new environment, including the fact that users can use EasyBuild themselves to build and install software locally while still leveraging the global software stack, but that and more will be the subject of upcoming articles.
For now, it’s important to note that the software in this new environment is organised by “toolchains” (collections of compilers and libraries, e.g., “foss” for open-source compilers and libraries and “intel” for the Intel ones) and toolchain common versions (e.g., 2021b, 2022a) and that all necessary dependencies will be loaded – e.g., try running:
$ module load SciPy-bundle && module list
to see the loaded dependencies and their exact versions.
One final thing to add, if you just want to use this environment to have access to modern compilers and libraries and then build software manually yourself, you can use the “buildenv” module, e.g.,
$ module load buildenv/default-foss-2022a
which will give you GCC/11.3.0, OpenBLAS/0.3.20, FFTW/3.3.10, etc. (again, run “module list” after “module load” to see the loaded modules). If you’re building R packages, you can load the module buildenv/R-foss-2022a instead.
One final final thing to add, for Python packages that are not yet available as environment modules (but do check first, with “module avail”, if there isn’t already an environment module for what you need), you can load a recent Python+SciPy version with “module load SciPy-bundle” and then create a local virtualenv (https://virtualenv.pypa.io/en/latest/) folder with e.g.,
$ virtualenv --system-site-packages venv
(“–system-site-packages” here will actually use the python packages from the module you loaded, and “venv” is just a folder name).
After that, you can activate the virtualenv with:
$ source venv/bin/activate
and then use pip to install only the python packages that are not yet available as environment modules. As mentioned above, this is not ideal, and you will have to manage the correspondence of environment modules and virtualenvs yourself (you can use the venv name to keep track of this), but it is often a reasonable compromise until a global module is available.
Do try it out and contact us with any questions/feedback/requests. Happy scientific computing.