ITS Microsite Template - NUS Information Technology | NUS IT Services, Solutions & Governance

DEEP LEARNING BEST PRACTICES: CHECKPOINTING YOUR DEEP LEARNING MODEL TRAINING

What Needs to be Saved

H6 Subheading

H7 subheading (orange)

H8 subheading (black)

This article covers one of many best practices in Deep Learning, which is creating checkpoints while training your deep learning model. We will look at what needs to be saved while creating checkpoints, why checkpoints are needed (especially on NUS HPC systems), methods to create them, how to create checkpoints in various deep learning frameworks (Keras, Tensorflow, Pytorch) and their benefits.

checkpointer = ModelCheckpoint(filepath='/tmp/weights.hdf5', , monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
)

model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0, validation_data=(X_test, Y_test), callbacks=[checkpointer])

#	Linux Command	Description
1	Mark	Otto
2	# cp /hpctmp2/a0123456/file2 ~/file2	Thornton