DATA ANALYTICS/MACHINE LEARNING/DEEP LEARNING DEVELOPMENT AND SUPPORT
We will look into how technologies have evolved since “Big Data” was coined in 2005. We will share what we have done preparing for this new era of computing and how you can tap into those resources and services to kick start your Big Data or AI journey.
Key Development in Big Data Analytics and Machine Learning (ML)/Deep Learning (DL)
2005 |
|
2010 |
|
2011 |
|
2012 | • Deep Learning breakthrough year – the expanding use of GPU capability to increase visual recognition accuracy |
2015 |
|
2016 | • Pytorch initial release (ML/DL software library) |
2017 |
|
Common resources and services for data centric research
We started preparing for this new era of data centric computing in 2011 when we were developing a low-cost commodity hardware based storage service. It subsequently evolved into a more reliable, still low-cost, on-demand utility storage service we are offering today.
In 2017 we set up a Data Engineering Technology (DET) team to provide technical consultation and support.
In 2018 we completed the 100G high-speed research implementation that enable large-scale data transfer among research centres, institutes and the National Supercomputing Centre (NSCC).
Resources and services for Big Data Analytics
The Hadoop Data Repository and Analytics System (DRAS) was introduced in 2017. It has been set up with a scalable storage and the following common computing environment/software:
- Spark
- HIVE data warehouse
- Hbase NoSQL database
- Kafka streaming support
- Spark MLlib Machine Learning
- HUE and Zeppelin web interface
Resources and services for ML/DL
We have been providing GPU resources through AWS to support ML/DL research since September 2017. The GPU instances are configured with the following suite of popular ML/DL software:
- Tensorflow/Keras
- Pytorch
- Scikit-learn
- Anaconda python environment with data science packages (pandas, scipy, numpy)
- H2O
- Assorted ML libraries: e.g.: lightgbm, opencv
- Singularity containers
- OpenAI Gym
- Caffe
In June 2018, NSCC made available 6 DGX1 GPU systems for ML/DL research. Our DET team provided technical support and training for some NUS pilot users.
In January 2019, we will introduce some latest GPU systems fully equipped with ML/DL software in-house to help advancing AI research further.
How can you tap into the above resources and services to kick start your research?
If you are doing Big Data analytics, our DET team will help you onboarding your data in the DRAS. They will help in automating data streaming, extraction, transformation and loading. They will also provide programming support for your application development if necessary.
If you are doing ML/DL research, the DET team will help in preparing your data and the GPU computing environment. If you are not familiar with the programming environment, they can provide training and help in the initial application development.
The DET team also provide support for R and Matlab application development on the HPC cluster.
Write to DataEngineering@nus.edu.sg to begin your Big Data Analytics or ML/DL journal