NUSIT HPC

Home » for-migration-xixun

» Hadoop on AWS: Benefits of EMR

By Kumar Sambhav, Research Computing, NUS Information Technology, on 15 May 2020

Managing Big Data on Hadoop clusters has seen a lot of paradigm shift in the recent times. From Sysadmin managed clusters at the command line level to on-prem centrally managed platforms like Cloudera, Hortonworks and MapR. All of these platforms have a primary problem of being dependent on physical hardware resources. Read on to discover how EMR addresses this shortcoming in the cloud.

» Tackling HPC issue for parallel computing in MATLAB

By Vamshidhar Gangu, Research Computing, NUS Information Technology, on 15 May 2020

Tips on how to run MATLAB Parallel Computing Toolbox jobs properly in our HPC cluster. Potential conflict between multiple concurrent jobs is addressed in this article.

» New Computational Cluster in the Cloud

By Yeo Eng Hee, Research Computing, NUS Information Technology, on 15 May 2020

Over the past few articles in the HPC Newsletter, I have been writing on Cloud resources and how computational jobs can be run in the Cloud. The Research Computing team here has been working hard to make the Cloud resources available to registered HPC users in a secure and simple way, so that our HPC workloads can be run in the Cloud as well.

» Friendly Email Alert for HPC Batch Jobs

By Wang Junhong, Research Computing, NUS Information Technology, on 15 May 2020

A customised email alerting function is developed and enabled in the HPC system to send an email alert reporting the summary of jobs completed in the last hour to an individual user. So the individual user can get almost instant updates of his/her jobs via email App on a mobile devices anytime anywhere. This overcomes the inconvenience where users need to log into the HPC system from a computer terminal to check. Other useful information of the jobs can also be added into the alert report. Read on for more details.

» Data Manipulation and More with the Command Line

By Ku Wee Kiat, Research Computing, NUS Information Technology, on 15 May 2020

Ever needed to have a directory of files renamed to a certain format? Extract lines with certain keywords from log files? Even create csv files from semi-structured logs? There is no need to bust out the custom python or R scripts or install any software when most simple tasks can be solved at a much faster speed using Bash tools.

» Anomaly Detection: A Machine Learning Use Case

By Kuang Hao, Research Computing, NUS Information Technology, on 15 May 2020

Anomaly detection is mainly a data-mining process and is widely used in behavioral analysis to determine types of anomaly occurring in a given data set. It’s applicable in domains such as fraud detection, intrusion detection, fault detection and system health monitoring in sensor networks. Since the definition of anomaly is often complicated, and depending on historical data, machine learning is optimal for this type of application.

» What is Data Engineering

By Tan Chee Chiang, Research Computing, NUS Information Technology, on 12 May 2020

We launched Data Engineering support services a while ago to support and accelerate data centric research such as in Analytics and AI. We will discuss the similarities and differences between Data Engineers and Data Scientists, and on how Data Engineers can help in both data-centric research and enterprise computing.

» Projecting 2020 HPC Trends

By Tan Chee Chiang, Research Computing, NUS Information Technology, on 20 January 2020

Based on current market trends and technology roadmaps from market leaders, we can expect greater convergence of HPC and AI technologies, more processor (CPU) and accelerator options, further adoption of HPC Cloud and higher demand for storage capacity in 2020.

» Acceleration of Data Pre-processing

By Kuang Hao, HPC Specialist (Research Computing), NUS Information Technology, on 20 January 2020

As the first step in machine learning’s pipeline, the importance of data pre-processing (DP) should never be neglected. For researchers and data science learners, thanks to our open source community and all the machine learning enthusiasts, there are all the clean and generalized datasets online for research and studying. DP plays such an important role, because real-life data is almost never well-organized.