What is Data Engineering?
Tan Chee Chiang, Research Computing, NUS IT
We launched Data Engineering support services a while ago to support and accelerate data centric research such as in Analytics and AI. We will discuss the similarities and differences between Data Engineers and Data Scientists, and on how Data Engineers can help in both data-centric research and enterprise computing.
Data Engineer vs Data Scientist
From the academic point of view, Data Scientists generally have a more advanced degree than Data Engineers. While Data Scientists are equipped with the knowhow in research and development of new Data Analytics/AI algorithms and models, Data Engineers are trained in the application and deployment of the algorithms and models.
In a recent Gartner report on Data Science and Machine Learning (DSML) platforms, Data Scientists are being described as individuals who possess the skills and knowledge to understand and engage all stages of the data science life cycle. Most Data Scientists spend the largest share of their time and energy on model creation, with supporting roles such as Data Engineers and ML (Machine Learning) Engineers taking on data pipelining and MLOps (Machine Learning operationalization) responsibilities.
The career development path of Data Engineers presented at the SkillsFuture website reflects how the two roles overlap as reproduced below:
Digging further into the skills framework, both Data Engineer and Data Scientist roles require similar set of technical skills and competencies but with different proficiency level in some areas. Data Engineers focus more on data preparation and management whereas Data Scientists are responsible more in solution development and computational modeling.
With the shortage of Data Scientists worldwide, not all organizations can afford or able to recruit them. Fortunately, not all organizations need Data Scientists to start their data analytics journey. Oil and gas companies recruit many Data Scientists because they need to develop new algorithms and models that can give them the competitive edge in their exploration work. Many other companies just need Data Engineers to apply the existing algorithms or models for their analytics. As shown in the career path above, Data Engineers have the opptunity to advance their skills to become a Data Scientist one day.
What can the Data Engineering team at NUS IT do for you?
The scope of work of our Data Engineers includes:
• Development and operation of the Hadoop Data Repository & Analytics System
• User support and operation of the GPU system for machine learning
• Data masking tools development and operation
• Analytics and AI software installation and maintenance
• Programming and modeling support and training for various analytics and AI software
• Analytics and AI project technical consultation
• Analytics and AI project management and execution
• New technologies exploration and deployment
For analytics/AI research support, the team has been providing data ETL (Extract, Transform, Load) support to researchers. In some cases, the support also includes hardware, software and database installation and configuration. Examples of such support include an ETL support for a Twitter sentiment analytics research, and a database configuration and analytics pipeline development for healthcare related research.
For analytics/AI learning support, the team also offers training in Machine/Deep Learning, R and Python programming to shorten students’ data analytics learning curve. The team works with students in conducting “Machine Learning in Practice” DYOM (Design Your Own Module) course and in supervising internship projects.
For enterprise analytics/AI support, the team works closely with other domain experts and business owners in implementing various analytics projects. Examples of on-going projects include cybersecurity analytics, WiFi analytics, and safety alerting using IoT and vision analytics.
You may contact the Data Engineering team at DataEngineering@nus.edu.sg for enquiry of the above supports.