CENTRAL DATA MASKING PLATFORM FOR COLLABORATIVE RESEARCH
Data is the fuel for each and every analytics project. Successful analytics projects depend not only on the quality of the data they process but also on the fact that the data should be coming from varied sources for a good sample space coverage.
Most of the modern-day research projects have access to a limited number of data sets and face troubles when their datasets need to be linked with others. The notions of data privacy and confidentiality become an impedance in such cases.
To make collaborations in such cases possible researchers use masking or encryption to make data available to collaborators. Masking the data though solves the privacy concerns, it is not easy to do it reliably. It requires the knowledge of various encryption algorithms, as well as programming skills to do it the data pipelines that the researchers use.
To cater to such use cases, NUS IT has recently launched a service called “Central Data Masking Platform”. The platform helps users to mask their data in two modes depending upon whether it is an individual analytics project or a project being done by multiple collaborators.
Figure: Central Data Masking Platform homepage.
Currently the platform supports murmurhash3 and AES for the encryption methods. Other encryption algorithms will be added with future enhancements.
The platform currently accepts data in the form of csv and excel files. The support for other file types like json will be added in future enhancements.
Users can use their NUS-IDs to login and perform masking of their data according to their requirements. For individual projects, users need to have a non-empty masking key. They also need to keep the original dataset as it will be used to decrypt the final dataset later on.
For collaborative projects, the keys are managed by the Key Admin and stored in a Key Management Server. Each user who wishes to encrypt the data using a collaborative project must be added to the project before they can draw the shared key. For collaborative projects also, it is required to keep the original dataset as it will be used for decryption.
The collaborative project needs to be activated by Key Admin first and only then can the collaborators use the shared key to mask their data.
A flowchart has been given below to explain the flow.
Users can access the Central Data Masking Platform at https://hdpapps.nus.edu.sg:3300/encryptor/home. If you face any difficulties or have any queries, please write to us at data.engineering@nus.edu.sg