» Data Manipulation and More with the Command Line
By Ku Wee Kiat, Research Computing, NUS Information Technology, on 15 May 2020
Ever needed to have a directory of files renamed to a certain format? Extract lines with certain keywords from log files? Even create csv files from semi-structured logs? There is no need to bust out the custom python or R scripts or install any software when most simple tasks can be solved at a much faster speed using Bash tools.
» Anomaly Detection: A Machine Learning Use Case
By Kuang Hao, Research Computing, NUS Information Technology, on 15 May 2020
Anomaly detection is mainly a data-mining process and is widely used in behavioral analysis to determine types of anomaly occurring in a given data set. It’s applicable in domains such as fraud detection, intrusion detection, fault detection and system health monitoring in sensor networks. Since the definition of anomaly is often complicated, and depending on historical data, machine learning is optimal for this type of application.
» What is Data Engineering
By Tan Chee Chiang, Research Computing, NUS Information Technology, on 12 May 2020
We launched Data Engineering support services a while ago to support and accelerate data centric research such as in Analytics and AI. We will discuss the similarities and differences between Data Engineers and Data Scientists, and on how Data Engineers can help in both data-centric research and enterprise computing.
» Projecting 2020 HPC Trends
By Tan Chee Chiang, Research Computing, NUS Information Technology, on 20 January 2020
Based on current market trends and technology roadmaps from market leaders, we can expect greater convergence of HPC and AI technologies, more processor (CPU) and accelerator options, further adoption of HPC Cloud and higher demand for storage capacity in 2020.
» Acceleration of Data Pre-processing
By Kuang Hao, HPC Specialist (Research Computing), NUS Information Technology, on 20 January 2020
As the first step in machine learning’s pipeline, the importance of data pre-processing (DP) should never be neglected. For researchers and data science learners, thanks to our open source community and all the machine learning enthusiasts, there are all the clean and generalized datasets online for research and studying. DP plays such an important role, because real-life data is almost never well-organized.